Summary
=======
This dataset (ml-latest-small) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.
Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.
The data are contained in the files `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.
This is a *development* dataset. As such, it may change over time and is not an appropriate dataset for shared research results. See available *benchmark* datasets if that is your intent.
This and other GroupLens data sets are publicly available for download at <http://grouplens.org/datasets/>.
Usage License
=============
Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions:
* The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group.
* The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information).
* The user may redistribute the data set, including transformations, so long as it is distributed under these same license conditions.
* The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.
* The executable software scripts are provided "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of them is with you. Should the program prove defective, you assume the cost of all necessary servicing, repair or correction.
In no event shall the University of Minnesota, its affiliates or employees be liable to you for any damages arising out of the use or inability to use these programs (including but not limited to loss of data or data being rendered inaccurate).
If you have any further questions or comments, please email <grouplens-info@umn.edu>
Citation
========
To acknowledge use of the dataset in publications, please cite the following paper:
> F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>
Further Information About GroupLens
===================================
GroupLens is a research group in the Department of Computer Science and Engineering at the University of Minnesota. Since its inception in 1992, GroupLens's research projects have explored a variety of fields including:
* recommender systems
* online communities
* mobile and ubiquitious technologies
* digital libraries
* local geographic information systems
GroupLens Research operates a movie recommender based on collaborative filtering, MovieLens, which is the source of these data. We encourage you to visit <http://movielens.org> to try it out! If you have exciting ideas for experimental work to conduct on MovieLens, send us an email at <grouplens-info@cs.umn.edu> - we are always interested in working with external collaborators.
Content and Use of Files
========================
Formatting and Encoding
-----------------------
The dataset files are written as [comma-separated values](http://en.wikipedia.org/wiki/Comma-separated_values) files with a single header row. Columns that contain commas (`,`) are escaped using double-quotes (`"`). These files are encoded as UTF-8. If accented characters in movie titles or tag values (e.g. Misérables, Les (1995)) display incorrectly, make sure that any program reading the data, such as a text editor, terminal, or script, is configured for UTF-8.
User Ids
--------
MovieLens users were selected at random for inclusion. Their ids have been anonymized. User ids are consistent between `ratings.csv` and `tags.csv` (i.e., the same id refers to the same user across the two files).
Movie Ids
---------
Only movies with at least one rating or tag are included in the dataset. These movie ids are consistent with those used on the MovieLens web site (e.g., id `1` corresponds to the URL <https://movielens.org/movies/1>). Movie ids are consistent between `ratings.csv`, `tags.csv`, `movies.csv`, and `links.csv` (i.e., the same id refers to the same movie across these four data files).
Ratings Data File Structure (ratings.csv)
-----------------------------------------
All ratings are contained in the file `ratings.csv`. Each line of this file after the header row represents one rating of one movie by one user, and has the following format:
userId,movieId,rating,timestamp
The lines within this file are ordered first by userId, then, within user, by movieId.
Ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).
Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.
Tags Data File Structure (tags.csv)
-----------------------------------
All tags are contained in the file `tags.csv`. Each line of this file after the header row represents one tag applied to one movie by one user, and has the following format:
userId,movieId,tag,timestamp
The lines within this file are ordered first by userId, then, within user, by movieId.
Tags are user-generated metadata about movies. Each tag is typically a single word or short phrase. The meaning, value, and purpose of a particular tag is determined by each user.
Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970.
Movies Data File Structure (movies.csv)
---------------------------------------
Movie information is contained in the file `movies.csv`. Each line of this file after the header row represents one movie, and has the following format:
movieId,title,genres
Movie titles are entered manually or imported from <https://www.themoviedb.org/>, and include the year of release in parentheses. Errors and inconsistencies may exist in these titles.
Genres are a pipe-separated list, and are selected from the following:
* Action
* Adventure
* Animation
* Children's
* Comedy
* Crime
* Documentary
* Drama
* Fantasy
* Film-Noir
* Horror
* Musical
* Mystery
* Romance
* Sci-Fi
* Thriller
* War
* Western
* (no genres listed)
Links Data File Structure (links.csv)
---------------------------------------
Identifiers that can be used to link to other sources of movie data are contained in the file `links.csv`. Each line of this file after the header row represents one movie, and has the following format:
movieId,imdbId,tmdbId
movieId is an identifier for movies used by <https://movielens.org>. E.g., the movie Toy Story has the link <https://movielens.org/movies/1>.
imdbId is an identifier for movies used by <http://www.imdb.com>. E.g., the movie Toy Story has the link <http://www.imdb.com/title/tt0114709/>.
tmdbId is an identifier for movies used by <https://www.themoviedb.org>. E.g., the movie Toy Story has the link <https://www.themoviedb.org/movie/862>.
Use of the resources listed above is subject to the terms of each provider.
Cross-Validation
----------------
Prior versions of the MovieLens dataset included either pre-computed cross-folds or scripts to perform this computation.
没有合适的资源?快使用搜索试试~ 我知道了~
资源详情
资源评论
资源推荐
收起资源包目录
虫害数据集,包含7种害虫数据集 (1051个子文件)
easy_install-3.8 287B
pip3.8 269B
activate 2KB
sysconfig.cfg 3KB
pyvenv.cfg 95B
activate.csh 1KB
style.css 637B
ratings.csv 2.27MB
movies.csv 473KB
links.csv 184KB
tags.csv 112KB
beijing_tianqi_2017-2019.csv 62KB
titanic_train.csv 59KB
housing.csv 48KB
housing_clean.csv 40KB
titanic_test.csv 28KB
beijing_tianqi_2018.csv 21KB
heart.csv 13KB
学生成绩.csv 2KB
ratings.dat 23.45MB
movies.dat 167KB
users.dat 131KB
housing.data 48KB
iris.data 4KB
bezdekIris.data 4KB
.DS_Store 12KB
.DS_Store 6KB
.DS_Store 6KB
.DS_Store 6KB
easy_install 287B
t64.exe 101KB
w64.exe 98KB
t32.exe 91KB
w32.exe 87KB
gui-64.exe 74KB
cli-64.exe 73KB
gui.exe 64KB
cli-32.exe 64KB
gui-32.exe 64KB
cli.exe 64KB
activate.fish 2KB
.gitignore 176B
.gitignore 39B
.gitignore 39B
.gitignore 24B
bar_histogram.html 4KB
user_info.html 366B
pandas-flask.iml 740B
test.iml 435B
merge_excel_multi_sheets.iml 324B
Index 114B
Index 105B
INSTALLER 4B
INSTALLER 4B
20. Pandas的stack和pivot实现数据透视.ipynb 113KB
24. Pandas怎样结合Pyecharts绘制折线图.ipynb 100KB
21. Pandas怎样快捷方便的处理日期数据.ipynb 92KB
16. Pandas的分组聚合groupby.ipynb 86KB
转置合并2个文件.ipynb 75KB
33. Pandas计算同比环比指标的3种方法.ipynb 69KB
13. Pandas怎样实现DataFrame的Merge.ipynb 65KB
34. Pandas和数据库查询语言SQL的对比.ipynb 64KB
26. Pandas处理分析网站原始访问日志.ipynb 62KB
32. Pandas借助Python爬虫读取HTML网页表格存储到Excel文件.ipynb 60KB
17. Pandas的分层索引MultiIndex.ipynb 57KB
22. Pandas怎么处理日期索引的缺失.ipynb 51KB
04. Pandas查询数据.ipynb 48KB
45. Pandas实现模糊匹配Merge数据的方法.ipynb 43KB
14. Pandas实现数据的合并concat.ipynb 42KB
09. Pandas数据排序.ipynb 41KB
41. Pandas读取Excel绘制直方图.ipynb 41KB
转置合并2个文件-checkpoint.ipynb 40KB
合并2个文件-checkpoint.ipynb 39KB
27. Pandas怎样找出最影响结果的那些特征?.ipynb 38KB
18. Pandas的数据转换函数map&apply&applymap.ipynb 36KB
30. Pandas的get_dummies用于机器学习的特征处理.ipynb 31KB
07. Pandas缺失值处理.ipynb 31KB
31. Pandas使用explode实现一行变多行统计.ipynb 31KB
35. Pandas实现groupby聚合后不同列数据统计.ipynb 29KB
08. Pandas的SettingWithCopyWarning报警怎么解决.ipynb 28KB
39. Python自动翻译英语论文PDF.ipynb 27KB
39. Pandas处理Excel复杂多列到多行转换.ipynb 27KB
49. Pandas查询数据的简便方法df.query.ipynb 27KB
10. Pandas字符串处理.ipynb 27KB
38. Python批量翻译英语单词.ipynb 26KB
05. Pandas新增数据列.ipynb 26KB
40. Pandas怎样实现groupby聚合后字符串列的合并.ipynb 25KB
46. 计算每个学生成绩最相似的10个学生.ipynb 24KB
Pandas实现Excel一行变多行.ipynb 24KB
12. Pandas的索引的用途.ipynb 24KB
06. Pandas数据统计函数.ipynb 23KB
03. Pandas数据结构.ipynb 19KB
37. Python使用Pandas将Excel存入MySQL.ipynb 18KB
02. Pandas读取数据.ipynb 17KB
11. Pandas的axis参数怎么理解?.ipynb 16KB
19. Pandas怎样对每个分组应用apply函数.ipynb 16KB
25. Pandas结合Sklearn实现泰坦尼克存活率预测.ipynb 15KB
23. Pandas怎样实现Excel的vlookup并且在指定列后面输出?.ipynb 15KB
15. Pandas批量拆分与合并Excel.ipynb 14KB
48. Pandas给表格使用apply同时添加多列.ipynb 13KB
共 1051 条
- 1
- 2
- 3
- 4
- 5
- 6
- 11
源城编程哥
- 粉丝: 1220
- 资源: 7
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论5