playerElo：将赛程强度纳入MLB球员分析_R

共29个文件

csv：19个

r：3个

dcf：2个

版权申诉

198 浏览量 2023-04-15 09:36:00 上传评论收藏 24.7MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

playerElo：将赛程强度纳入MLB球员分析_R_下载.zip （29个子文件）

playerElo_2019-master

.DS_Store 6KB

rsconnect

shinyapps.io

jrichey

playerelo.dcf 243B

documents

02_app.R

shinyapps.io

jrichey

playerElo.dcf 253B

.RData 14.06MB

03_app.R 10KB

data

p19EloDisplay.csv 52KB

mlb-player-stats-Batters.csv 52KB

.DS_Store 6KB

p18Elo.csv 20KB

teamElo19.csv 2KB

all_playerElo.csv 13.3MB

expected_stats-6.csv 73KB

eloTeam19.csv 2KB

exit_velocity-2.csv 72KB

b19Elo_disp.csv 56KB

expected_stats-7.csv 63KB

b19EloDisplay.csv 55KB

stateMatrix.csv 3KB

allPlayerEloX.csv 12.81MB

standings.csv 5KB

ParkFactors.csv 787B

b18Elo.csv 24KB

p19Elo_disp.csv 49KB

mlb-player-stats-P.csv 54KB

exit_velocity-3.csv 68KB

01_regular_app_construction.R 23KB

02_postseason_app_construction.R 16KB

README.md 17KB

.Rhistory 22KB

# playerElo 2019 Open the playerElo App here: https://jrichey.shinyapps.io/playerelo/ View playerElo published on FanGraphs here: https://community.fangraphs.com/playerelo-factoring-strength-of-schedule-into-player-analysis/ ###### *For the following article, all numbers are updated to September 10th, 2019.* ###### *Data sourced from Baseball Savant, RotoWire, Baseball Reference, Retrosheet, and BigDataBall.* ## Abstract With the sabermetric revolution of the MLB, a plethora of new statistics have come into the mainstream, and a growing number of fantasy owners, ballclubs, and regular fans are turning to these new statistical methods for player analysis. However, I propose even advanced metrics such as wOBA, FIP, xwOBA, xFIP, and wRC+ are all missing a crucial element to accurately represent player performance thus-far. The playerElo system is able to reveal in aggregate the effects of previously unconsidered aspects of the game. Using an Elo ranking system determined by run-value calculations of all major league baseball players, the model incorporates context-dependent analysis and quality of competition to produce a proper evaluation of batters and pitchers. This enables playerElo to appropriately credit pitchers, especially relievers, for their true impact on the game, particularly when called upon in disadvantageous situations. Additionally, playerElo does not allow relative team strength, which confounds common counting statistics, to influence the evaluation of a player. The model is a holistic approach to the assessment of major league players and has incredible ramifications on player projections during free agency and player acquisition. ## Introduction Consider the following comparison between Freddie Freeman (29) and Carlos Santana (33). Both players were starters for the 2019 All-Star teams of their respective leagues and are enjoying breakout seasons, beyond their usual high production level, with nearly identical statistics across the board. | Player | PA | wOBA | xwOBA | wRC+ | | :-: | :-: | :-: | :-: | :-: | | Freeman, 1B | 643 | 0.398 | 0.396 | 144 | | Santana, 1B | 624 | 0.389 | 0.371 | 141 | *Data from FanGraphs, Baseball Savant.* However, I argue there is an underlying statistic that makes Santana’s success less impressive and Freeman’s MVP-consideration worthy. Recall the quality of competition of pitchers faced. The Atlanta Braves’ division, the NL East, contains the respectable pitching competition of the Mets (11th league- wide in ERA), Nationals (12th), Phillies (17th), and Marlins (21st). Contrast this with the competition of the Cleveland Indians in the AL Central: The Twins (8th), White Sox (23rd), Royals (26th), and Tigers (28th). Over his first 500 plate appearances, Santana faced a top 15 pitcher (ranked by FIP) just 15 times, compared to 43 times by Freeman. wRC+ controls for park effects and the current run environment, while xwOBA takes into account quality of contact, but all modern sabermetrics fail to address the problem of Freeman and Santana’s near-equal statistics, despite widely different qualities of competition. Thus, I present the modeling system of playerElo. ## Methodology Conceived out of inspiration from Arpad Elo’s rating system for zero-sum games like chess, as well as FiveThirtyEight’s use of an Elo modeling scheme for MLB team ratings and season-wide predictions, playerElo treats all at-bats as events and maintains a running power ranking of all MLB batters and pitchers. The system uses expected run values over the 24 possible base-out states. Additionally, run values are calculated for each at-bat event by subtracting the run expectancy of the beginning state from the ending state, and adding the runs scored. `Run Value of Play = RE End State - RE Beginning State + Runs Scored` The following run expectancy matrix presents the expected runs scored for the remainder of the inning, given the current run environment, baserunners, and number of outs. Data is sourced from all at-bats from 2016-2018, and expected run values are rounded to the second decimal place. For example, a grand slam hit with one out would shift the run expectancy from 1.54 to 0.27 and score four runs, so the run value of the play would be 2.73. | 1B | 2B | 3B | 0 outs | 1 out | 2 outs | | :-: | :-: | :-: | :-: | :-: | :-: | | -- | -- | -- | 0.51 | 0.27 | 0.11 | | 1B | -- | -- | 0.88 | 0.52 | 0.22 | | -- | 2B | -- | 1.15 | 0.69 | 0.32 | | -- | -- | 3B | 1.39 | 0.97 | 0.36 | | 1B | 2B | -- | 1.45 | 0.93 | 0.44 | | 1B | -- | 3B | 1.77 | 1.20 | 0.48 | | -- | 2B | 3B | 1.97 | 1.40 | 0.56 | | 1B | 2B | 3B | 2.21 | 1.54 | 0.75 | *Data from Retrosheet, 2016-2018.* The model begins with a calibration year of 2018, and for 2019, players begin with their previous seasons’ ending playerElo, regressed to the mean slightly. If a player did not have a single plate appearance or batter faced pitching in 2018, such as Vladimir Guerrero Jr. or Chris Paddack, then they are assigned a baseline playerElo of 1000 (calibration year of 2018 began every player at 1000). For every at-bat, given the current base-out state, an expected run value for both the batter and pitcher is calculated, based on quadratic formulas of historic performance of players of that caliber in the given situation. The dependency of the Elo formula on the base-out state ensures the model is context-dependent, meaning it incorporates the fact that a bases-loaded double is far more valuable than a double with the bases empty, however, it also takes into account that runs were more likely to be scored in the former situation compared to the latter. It is important to note playerElo is a raw batting statistic and does not evaluate overall production, meaning stolen bases are not factored into the ranking system. Additionally, while the model does not take defense into account, it also does not count stolen bases or passed balls negatively against a pitcher, and likewise does not count changes in game states due to wild pitches positively for a batter. Once an expected run value is synthesized from the current state and the playerElo of the batter and the pitcher, park factor and home field advantage adjustments (if applicable) are made, and the expected run value of the play is then compared to the true run value outcome. The playerElo of both the batter and pitcher are then updated accordingly, dependent on the difference between the true run value and the expected run value. For example, if an excellent pitcher strikes out a mediocre batter, the batter will not lose much Elo, and the pitcher will not gain much Elo. Likewise, if a below- average batter does extremely well against a top pitcher, there will be a far greater change in the Elo of both players. Errors are also taken into account and will prevent a positive run value from counting against a pitcher or positively for a batter. Refer to the Technical Appendix at the end of the README.md for further details regarding the playerElo methodology. ## Player Analysis ![playerElo Top 25](https://user-images.githubusercontent.com/22247220/64912297-1fca5580-d6fb-11e9-988d-b4f9442d5576.png) It is interesting to note Nolan Arenado and Edwin Encarnacion do particularly well in the model, even with park factor adjustments. This is can be attributed to the difficulty of schedule of the Rockies and Yankees, facing the tough pitching competition in the NL West and AL East respectively. The average pitching Elo faced by Arenado and Encarnacion is 1010.5 and 1009.6 (20th and 25th highest overall). Quality of contact does leave room to be desired for Arenado, however playerElo does not incorporate statistics like exit velocity and launch angle in its calculations, and thus the model is a better reflection of on-field performance than underlying swing metrics. In contrast to Arenado and Encarnacion, Yordan Álvarez has played incredibly since called up in June but has faced some of the easiest competition i

评论收藏

内容反馈

版权申诉