# MovieLens-Recommender
[MovieLens-Recommender][1] is a pure Python implement of ``Collaborative Filtering``. Which contains ``User Based Collaborative Filtering(UserCF)`` and ``Item Based Collaborative Filtering(ItemCF)``. As comparisons, ``Random Based Recommendation`` and ``Most-Popular Based Recommendation`` are also included. The famous ``Latent Factor Model(LFM)`` is added in this Repo,too.
The buildin-datasets are ``Movielens-1M`` and ``Movielens-100k``. But of course, you can use other custom datasets.
Besides, there are two models named ``UserCF-IIF`` and ``ItemCF-IUF``, which have improvement to ``UseCF`` and ``ItemCF``. They eliminate the influence of very popular users or items.
# Overview
The book 《[推荐系统实践](https://book.douban.com/subject/10769749/)》 written by *Xiang Liang* is quite wonderful for those people who don't have much knowledge about Recommendation System. But the book only offers each function's implement of ``Collaborative Filtering``. A good architecture project with datasets-build and model-validation process are required.
So I made [MovieLens-Recommender][1] project, which is a pure Python implement of ``Collaborative Filtering`` based on the ideas of the book.
This repository is based on [MovieLens-RecSys][2], which is also a good implement of ``Collaborative Filtering``. But its efficiency is so damn poor!
Besides, [Surprise][3] is a very popular Python *scikit* building and analyzing recommender systems. So, I Mix the advantages of these two projects, and here comes ``MovieLens-Recommender``.
My Recommendation System contains four steps:
- Create trainset and testset
- Train a recommender model
- Give recommendations
- Evaluate results
At the end of a recommendation process, four numbers are given to measure the recommendation model, which are:
- Precision
- Recall
- Coverage
- Popularity
**No python extensions(e.g. Numpy/pandas) are needed!**
# Getting started
**1. Download**
``Git`` is awesome~
```shell
git clone https://github.com/fuxuemingzhu/MovieLens-Recommender.git
```
`Movielens-1M` and `Movielens-100k` datasets are under the `data/` folder.
**2. Run**
The configures are in `main.py`. Pleas choose the dataset and model you want to use and set the proper test_size. The default values in `main.py` are shown below:
```python
dataset_name = 'ml-100k'
# dataset_name = 'ml-1m'
# model_type = 'UserCF'
# model_type = 'UserCF-IIF'
# model_type = 'ItemCF'
# model_type = 'Random'
# model_type = 'MostPopular'
model_type = 'ItemCF-IUF'
# model_type = 'LFM'
test_size = 0.1
```
Then run ``python main.py`` in your command line. There will be a recommendation model built on the dataset you choose above.
Note: my code only tested on python3, so python3 is prefer.
```shell
Python main.py
#Python3 main.py
```
if you are using Linux, this command will redirect the whole output into a file.
```shell
Python main.py > run.log 2>&1 &
#Python3 main.py > run.log 2>&1 &
```
This command will run in background. You can wait for the result, or use `tail -f run.log` to see the real time result.
All model will be saved to `model/` fold, which means the time will be cut down in your next run.
**3. Output**
Here is a example run result of ItemCF model trained on ml-1m with test_size = 0.10. No mater which model are chosen, the output log will like this.
```
**********************************************************************
This is ItemCF model trained on ml-1m with test_size = 0.10
**********************************************************************
ItemBasedCF start...
No model saved before.
Train a new model...
counting movies number and popularity...
counting movies number and popularity success.
total movie number = 3693
generate items co-rated similarity matrix...
steps(0), 0.00 seconds have spent..
steps(1000), 18.50 seconds have spent..
steps(2000), 46.39 seconds have spent..
steps(3000), 63.52 seconds have spent..
steps(4000), 87.37 seconds have spent..
steps(5000), 111.83 seconds have spent..
steps(6000), 132.71 seconds have spent..
generate items co-rated similarity matrix success.
total step number is 6040
total 133.61 seconds have spent
calculate item-item similarity matrix...
steps(0), 0.00 seconds have spent..
steps(1000), 1.77 seconds have spent..
steps(2000), 3.47 seconds have spent..
steps(3000), 5.01 seconds have spent..
calculate item-item similarity matrix success.
total step number is 3693
total 5.67 seconds have spent
Train a new model success.
The new model has saved success.
recommend for userid = 1:
['1196', '364', '1265', '318', '2081', '1282', '1198', '2716', '1', '2096']
recommend for userid = 100:
['2916', '1580', '457', '1240', '589', '1291', '780', '1036', '1610', '1214']
recommend for userid = 233:
['1022', '594', '1282', '2087', '2078', '1196', '608', '2081', '593', '1393']
recommend for userid = 666:
['296', '1704', '593', '356', '1196', '589', '1580', '50', '1393', '1']
recommend for userid = 888:
['2916', '457', '480', '2628', '1265', '1610', '1198', '1573', '2762', '1527']
Test recommendation system start...
steps(0), 0.10 seconds have spent..
steps(1000), 291.42 seconds have spent..
steps(2000), 627.60 seconds have spent..
steps(3000), 898.21 seconds have spent..
steps(4000), 1219.94 seconds have spent..
steps(5000), 1523.29 seconds have spent..
steps(6000), 1817.46 seconds have spent..
Test recommendation system success.
total step number is 6040
total 1829.26 seconds have spent
precision=0.1900 recall=0.1147 coverage=0.1673 popularity=7.3911
total Main Function step number is 0
total 1972.49 seconds have spent
```
# Benchmarks
Here are four models' benchmarks over Precision、Recall、Coverage、Popularity. The testsize is 0.1.
These results are nearly same with *Xiang Liang*'s book, which proves that my algorithms are right.
**Movielens 1M:**
| Movielens 1M | Precision | Recall | Coverage | Popularity |
| :------------------: | --------: | -----: | -------: | ---------: |
| UserCF | 19.84% | 11.97% | 28.16% | 7.2023 |
| ItemCF | 19.00% | 11.47% | 16.73% | 7.3911 |
| UserCF-IIF | 19.77% | 11.93% | 29.62% | 7.1660 |
| ItemCF-IUF | 18.71% | 11.29% | 15.03% | 7.4748 |
| LFM | / | / | / | / |
| Random | 0.54% | 0.33% | 100.00% | 4.4075 |
| Most Popular | 10.59% | 6.39% | 2.76% | 7.7462 |
**Movielens 100k:**
| Movielens 100k | Precision | Recall | Coverage | Popularity |
| :------------------: | --------: | -----: | -------: | ---------: |
| UserCF | 19.69% | 18.50% | 22.20% | 5.4928 |
| ItemCF | 17.89% | 16.80% | 13.23% | 5.6202 |
| UserCF-IIF | 19.57% | 18.38% | 22.74% | 5.4716 |
| ItemCF-IUF | 20.38% | 12.30% | 17.30% | 7.3643 |
| LFM | 20.29% | 19.06% | 27.41% | 4.9983 |
| Random | 0.82% | 0.77% | 99.64% | 3.0332 |
| Most Popular | 10.54% | 9.90% | 4.07% | 5.9578 |
# Notice
UserCF is faser than ItemCF. Using `ml-100k` instead of `ml-1m` will speed up the predict process.
Caculating similarity matrix is quite slow. Please wait for the result patiently.
LFM will make negative samples when running. And when the ratio of Neg./Pos. goes to larger, the performance goes to better.
LFM has more parameters to tune, and I don't spend much time to do this. I believe you will do quite better!
# Licence
Apache License.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
[1]: https://github.com/fuxuemingzhu/MovieLens-Recommender
[2]: https://github.com/Lockvictor/MovieLens-RecSys
[3]: https://githu
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
这些项目以Python语言为基础,是一系列多样化的系统。无论是学业预警、自主评测,还是电影推荐、二维码识别,或者是数据加密、信息隐藏,这些项目充分利用了Python语言的优势,为用户提供了高效、灵活的解决方案。 Python语言作为一种高级编程语言,具有简洁、可读性强的特点,使得开发者可以更加专注于解决问题的逻辑。同时,Python拥有丰富的开源库和框架,如Django、Flask、OpenCV等,为项目开发提供了强大的支持。 这些项目的开发旨在为用户提供便捷、智能的服务和功能。Python语言具备广泛的应用领域,从机器学习、自然语言处理,到图像处理、数据可视化,Python在各个领域都有良好的应用场景。同时,Python的生态系统也在不断扩展和发展,拥有丰富的第三方库和工具。 通过Python语言的支持,这些项目可以实现多样化的功能需求,如数据分析、图像处理、网络安全等。同时,Python的简洁性和易读性,使得项目的开发、测试和维护更加高效和方便。 总之,这些项目利用Python语言的优势和多样化特性,为各个领域的应用和研究提供了强大的解决方案。无论是学校学业预警、电影推荐,还是数据加密、图像识别,这些项目都能够高效、灵活地满足用户需求,为用户提供优质的体验。
资源推荐
资源详情
资源评论
收起资源包目录
python125基于知识图谱电影推荐问答系统_django 2.zip (866个子文件)
ua.base 1.71MB
ub.base 1.71MB
u1.base 1.51MB
u2.base 1.51MB
u3.base 1.51MB
u4.base 1.51MB
u5.base 1.51MB
bootstrap.css 118KB
layui.css 78KB
style.css 36KB
font-awesome.min.css 28KB
layer.css 15KB
layer.css 14KB
single.css 11KB
layui.mobile.css 11KB
style.css 10KB
laydate.css 9KB
medile.css 9KB
news.css 8KB
public.css 8KB
jquery.slidey.min.css 7KB
popuo-box.css 7KB
laydate.css 7KB
404style.css 6KB
movie_list.css 5KB
contactstyle.css 5KB
table-style.css 5KB
flexslider.css 4KB
list.css 2KB
faqstyle.css 2KB
mobile.css 2KB
owl.carousel.css 1KB
code.css 1KB
basictable.css 942B
jquery.pagination.css 523B
font.css 512B
movieInfo.csv 36KB
u.data 1.89MB
基于知识图谱的电影推荐问答系统.docx 3.97MB
fontawesome-webfont.eot 75KB
iconfont.eot 46KB
glyphicons-halflings-regular.eot 20KB
u.genre 202B
59.gif 10KB
22.gif 10KB
24.gif 8KB
13.gif 7KB
16.gif 7KB
39.gif 6KB
64.gif 6KB
63.gif 6KB
50.gif 6KB
loading-0.gif 6KB
4.gif 6KB
1.gif 5KB
42.gif 5KB
71.gif 5KB
21.gif 5KB
20.gif 5KB
29.gif 5KB
70.gif 4KB
5.gif 4KB
17.gif 4KB
27.gif 4KB
9.gif 4KB
44.gif 4KB
11.gif 4KB
8.gif 4KB
3.gif 4KB
23.gif 4KB
34.gif 4KB
41.gif 4KB
38.gif 4KB
65.gif 3KB
32.gif 3KB
45.gif 3KB
7.gif 3KB
12.gif 3KB
26.gif 3KB
60.gif 3KB
2.gif 3KB
40.gif 3KB
25.gif 3KB
19.gif 3KB
66.gif 3KB
18.gif 3KB
46.gif 3KB
10.gif 3KB
28.gif 3KB
51.gif 3KB
57.gif 3KB
67.gif 3KB
0.gif 3KB
48.gif 3KB
43.gif 3KB
30.gif 2KB
61.gif 2KB
33.gif 2KB
69.gif 2KB
14.gif 2KB
共 866 条
- 1
- 2
- 3
- 4
- 5
- 6
- 9
资源评论
Java码库
- 粉丝: 1425
- 资源: 3918
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功