Notes:
- Feb 2023: I published a Python package called **fantasy_ga** for generating fantasy lineups with Genetic Algorithm, check it out at https://pypi.org/project/fantasy-ga/. This repository will be updated to incorporate this library as well as other updates and bug fixes.
## NBA Player Performance Prediction and Lineup Optimization
Prediction of NBA player performance defined as Fantasy Points by Draft Kings. This capstone project was originally conducted and approved by a reviewer as part of Machine Learning Engineer Nanodegree by Udacity. See the final report [here](https://github.com/KengoA/fantasy-basketball/blob/master/report.pdf) for details.
Note that the code was updated since the writing of the report and the content does not necessarily match up. This project is under minor refactoring and documentation as of Jun 2019, feel free to reach out to me via email at [email protected].
### What We'll Do
The end goal of this project is to generate a series of lineups for a fantasy basketball website [DraftKings](https://www.draftkings.com/). To achieve that, we'll scrape player statististics from each regular season game starting in the 2014-15 season as well as past fantasy salary information. First, we'll build a predictive model for player performance, and then we will use genetic algorithm to construct fantasy lineups to maximize the total fantasy points while satisfying the salary constraint.
### What We Won't Do
Given the complexity of the series structure and difference in nature, we won't be considering playoff games. In addition, a major factor omitted from our analysis is the opponent's defensive ability as a team or at a given position (for instance, Paul George is excellent at stealing the ball), which arguably is one of the most important factors of the game. Tackling this aspect will be an easy improvement to this project.
### Requirements
Along with libraries specified in [requirements.txt](requirements.txt), you need to sign up with Plotly for free to create interactive visualizations.
### Understanding Fantasy Sports
The key to select a good fantasy lineup is to identify players that are consistent performers. This basic intuition comes from Harry Morkowitz's [Modern Portfolio Theory (MPT)](https://www.investopedia.com/terms/m/modernportfoliotheory.asp), and the following scatter plot looks at the relationship between risk and return, where return is the average fantasy points over a given range of games (in this case, past 10 games) and risk is its standard deviation. For a given level of risk (x-axis), a player with a better return is considered to be superior. The plot is based on the late 2018-19 season statistics, where, for instance, Lebron James is shown in the top-left with a 10-game average of a whopping 58.1 fantasy points and a 7.4 standard deviation. The top curve starting from Harden (top right), Lebron, Paul George, Gorbert, Sexton, and Zubac can be considered as [Efficient Frontier](https://www.investopedia.com/terms/e/efficientfrontier.asp) in the framework of the MPT. In general, players on the outer left of the cluster are considered as good assets with low risk and high return. Players are color-coded based on rough positions of PG, SG, F (SF, PF, SF/PF), and C (PF/C, C). A fully interactive version can be accessed [here](https://plot.ly/~KengoA/12/_10-game-risk-return-relationship/#/).
![10-game risk-return](assets/risk_return.gif)
Another important dimension of the game of fantasy basketball is player salary. Fantasy sports websites like DraftKings determine player salary values given their previous performance and roster information in a semi-automatic fashion, such that it is more costly to include "stud" players with high expected return (LeBron, Westbrook, Harden) into your lineup. DraftKings has a salary cap of $50,000 for a selection of 8 players, giving each player an average salary of $6250. The graph below shows a scatterplot of a player's salary and his actual performance of the day in the latest games of the 2018-19 season. One striking insight is that while it is easy to identify studs like Westbrook who had a total fantasy poitns of 61.5 for his 11.8k salary April 10, "value" players who exceed expectations are much more difficult to find, with a large variance given a salary level. For instance, Jamal Crawford on the top left had monstrous performance for a combined fantasy points of 70.25, despite the low expectation of his \$4300 salary. These "value" players are what differentiates winning lineups from those of a typical beginner with a collection of star players and underachieving benchwarmers. This requires deeper insight into who will outperform their expectations. For instance, an injury of a starting player most likely increases minutes for other starting members and the second option player on the bench. A fully interactive version can be accessed [here](https://plot.ly/~KengoA/14/salary-return-relationship/#/).
![salary-return](assets/salary_return.gif)
### Project Structure
This project consists of 9 Jupyter notebooks and functionalities are described below, where the second half consists of construcing machine leanring models and making inference, and optimising lineups for DraftKings.
- [1_data_scraping.ipynb](notebooks/1_data_scraping.ipynb) scrapes games data from Basketball-Reference.com and salary and position information from RotoGuru.
- [2_merging_data.ipynb](notebooks/2_merging_data.ipynb) merges the two datasets with name standardisation and preliminary preprocessing of data such as calculation of FPTS based on the key statistics.
- [3_exploratory_analysis.ipynb](notebooks/3_exploratory_analysis.ipynb) visually explores relationships between; salary and actual FPTS and; expected FPTS and standard deviation of the past 10 games.
- [4_feature_engineering.ipynb](notebooks/4_feature_engineering.ipynb) constructs the baseline model with simple average along with additional three datasets with weighted average, where several features are engineered and incorporated.
- [5_baseline_models.ipynb](notebooks/5_baseline_models.ipynb) sets up the baseline model with simple season average adopted by DraftKings and linear regression with feature selection models. For notebooks 06-08, we use 5-fold cross validation to approximate model errors.
- [6_lightgbm_bayesian_optimization.ipynb](notebooks/6_lightgbm_bayesian_optimization.ipynb) uses bayesian optimisation method to find the best parameters for a boosting model using lightGBM. Parameters and their results are saved in a text file.
- [7_neural_networks.ipynb](notebooks/7_neural_networks.ipynb) constructs three neural network models using keras, and saves model weights only when there was an improvement. While deep learning models might not suit this dataset of limited size, it shows improvement compared to boosting models.
![learning](assets/learning.png)
- [8_predictions.ipynb](notebooks/8_predictions.ipynb) trains on the whole dataset except for the month of March 2019, where each contest's cashline for double up was manually obtained from RotoGrinders. Inference is made on this test data from March 2019.
- [9_lineup_GA_optimization.ipynb](notebooks/9_lineup_GA_optimization.ipynb) uses Genetic Algorithms to select best combinations of players on a given set of games ans predictions. Performance of the lineups chosen by the algorithm against other DraftKings users is examined for contests held in March, 2019. Note that the contest data is manually obtained from Rotogrindrs' ResultsDB page without scraping. Predictions from the baseline model and final model are compared to the actual performance. The following figure shows the optimal lineup this model returns, with differences between the actual FPTS and predicted FPTS with neural network and baseline models.
![lineup](assets/lineup.png)
Main procedures are coded and explained in markdown using Jupyter Notebook. Although not requred, jupyter nbexten
没有合适的资源?快使用搜索试试~ 我知道了~
抓取统计数据,用神经网络和增强算法预测NBA球员的表现,用遗传算法优化DraftKings的阵容.zip
共13323个文件
csv:13247个
hdf5:48个
ipynb:9个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 141 浏览量
2023-03-31
22:35:48
上传
评论 1
收藏 100.94MB ZIP 举报
温馨提示
抓取统计数据,用神经网络和增强算法预测NBA球员的表现,用遗传算法优化DraftKings的阵容
资源推荐
资源详情
资源评论
收起资源包目录
抓取统计数据,用神经网络和增强算法预测NBA球员的表现,用遗传算法优化DraftKings的阵容.zip (13323个子文件)
2015-16.csv 14.43MB
2015-16.csv 14.39MB
2016-17.csv 13.92MB
2016-17.csv 13.87MB
2015-16.csv 13.86MB
2017-18.csv 13.6MB
2018-19.csv 13.6MB
2017-18.csv 13.55MB
2018-19.csv 13.55MB
2016-17.csv 13.37MB
2018-19.csv 13.06MB
2017-18.csv 13.05MB
2014-15.csv 11.84MB
2014-15.csv 11.79MB
2014-15.csv 11.35MB
df_2015-16.csv 5.41MB
df_2016-17.csv 5.27MB
df_2017-18.csv 5.27MB
df_2018-19.csv 5.22MB
df_2014-15.csv 4.58MB
2015-16.csv 4.16MB
2016-17.csv 4.02MB
2017-18.csv 3.94MB
2018-19.csv 3.92MB
2014-15.csv 3.41MB
20190628-20h42m-lgb.csv 312KB
20190628-19h37m-lgb.csv 312KB
20190628-19h40m-nn.csv 269KB
20190628-20h47m-nn.csv 269KB
salary_20171122.csv 18KB
salary_20181123.csv 17KB
salary_20190407.csv 17KB
salary_20161125.csv 17KB
salary_20190405.csv 16KB
salary_20180401.csv 16KB
salary_20180403.csv 16KB
salary_20181121.csv 16KB
salary_20171223.csv 16KB
salary_20170412.csv 15KB
salary_20161223.csv 15KB
salary_20170215.csv 15KB
salary_20180411.csv 15KB
salary_20190403.csv 15KB
salary_20171101.csv 15KB
salary_20171103.csv 15KB
salary_20160413.csv 15KB
salary_20181219.csv 15KB
salary_20151028.csv 15KB
salary_20180103.csv 15KB
salary_20171220.csv 15KB
salary_20160219.csv 15KB
salary_20190202.csv 15KB
salary_20180214.csv 15KB
salary_20190223.csv 14KB
salary_20161123.csv 14KB
salary_20190410.csv 14KB
salary_20171111.csv 14KB
salary_20181223.csv 14KB
salary_20190409.csv 14KB
salary_20171204.csv 14KB
salary_20171115.csv 14KB
salary_20180313.csv 14KB
salary_20181212.csv 14KB
salary_20181024.csv 14KB
salary_20171120.csv 14KB
salary_20181017.csv 14KB
salary_20171117.csv 14KB
salary_20190225.csv 14KB
salary_20171215.csv 14KB
salary_20190227.csv 14KB
salary_20180317.csv 14KB
salary_20151223.csv 14KB
salary_20181210.csv 14KB
salary_20180110.csv 14KB
salary_20151226.csv 14KB
salary_20171018.csv 14KB
salary_20181114.csv 13KB
salary_20180223.csv 13KB
salary_20181130.csv 13KB
salary_20190213.csv 13KB
salary_20171021.csv 13KB
salary_20170311.csv 13KB
salary_20170208.csv 13KB
salary_20170201.csv 13KB
salary_20170402.csv 13KB
salary_20160210.csv 13KB
salary_20151030.csv 13KB
salary_20151218.csv 13KB
salary_20190121.csv 13KB
salary_20151216.csv 13KB
salary_20160323.csv 13KB
salary_20151125.csv 13KB
salary_20181226.csv 13KB
salary_20190125.csv 13KB
salary_20181228.csv 13KB
salary_20190123.csv 13KB
salary_20181221.csv 13KB
salary_20171107.csv 13KB
salary_20190119.csv 13KB
salary_20171206.csv 13KB
共 13323 条
- 1
- 2
- 3
- 4
- 5
- 6
- 134
资源评论
快撑死的鱼
- 粉丝: 1w+
- 资源: 9154
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功