This data release lives at https://kaggle.com/seanlahman/baseball and was created by code at https://github.com/benhamner/baseball.
It is a processed version of the 2015 data at www.seanlahman.com/baseball-archive/statistics/.
The original database was copyright 1996-2015 by Sean Lahman and licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/
Likewise, this modified version is released under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/
The two key modifications from the original database were:
- we updated column names to have consistent formatting and naming
- we created a SQLite datbase in addition to the original CSV files
Below, we're copying README.txt and readme2014.txt from the original 2015 database. Note that the column names may have been modified from the ones stated in readme2014.txt. See https://github.com/benhamner/baseball/blob/master/src/process.py#L43 for the modifications to column names.
README.txt
==============================================================
Baseball Databank is a compilation of historical baseball data in a
convenient, tidy format, distributed under Open Data terms.
This work is licensed under a Creative Commons Attribution-ShareAlike
3.0 Unported License. For details see:
http://creativecommons.org/licenses/by-sa/3.0/
Person identification and demographics data are provided by
Chadwick Baseball Bureau (http://www.chadwick-bureau.com),
from its Register of baseball personnel.
Player performance data for 1871 through 2014 is based on the
Lahman Baseball Database, version 2015-01-24, which is
Copyright (C) 1996-2015 by Sean Lahman.
The tables Parks.csv and HomeGames.csv are based on the game logs
and park code table published by Retrosheet.
This information is available free of charge from and is copyrighted
by Retrosheet. Interested parties may contact Retrosheet at
http://www.retrosheet.org.
readme2014.txt
==============================================================
The Lahman Baseball Database
2014 Version
Release Date: January 24, 2015
----------------------------------------------------------------------
README CONTENTS
0.1 Copyright Notice
0.2 Contact Information
1.0 Release Contents
1.1 Introduction
1.2 What's New
1.3 Acknowledgements
1.4 Using this Database
1.5 Revision History
2.0 Data Tables
2.1 MASTER table
2.2 Batting Table
2.3 Pitching table
2.4 Fielding Table
2.5 All-Star table
2.6 Hall of Fame table
2.7 Managers table
2.8 Teams table
2.9 BattingPost table
2.10 PitchingPost table
2.11 TeamFranchises table
2.12 FieldingOF table
2.13 ManagersHalf table
2.14 TeamsHalf table
2.15 Salaries table
2.16 SeriesPost table
2.17 AwardsManagers table
2.18 AwardsPlayers table
2.19 AwardsShareManagers table
2.20 AwardsSharePlayers table
2.21 FieldingPost table
2.22 Appearances table
2.23 Schools table
2.24 SchoolsPlayers table
----------------------------------------------------------------------
0.1 Copyright Notice & Limited Use License
This database is copyright 1996-2015 by Sean Lahman.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. For details see: http://creativecommons.org/licenses/by-sa/3.0/
For licensing information or further information, contact Sean Lahman
at: seanlahman@gmail.com
----------------------------------------------------------------------
0.2 Contact Information
Web site: http://www.baseball1.com
E-Mail : seanlahman@gmail.com
If you're interested in contributing to the maintenance of this
database or making suggestions for improvement, please consider
joining our mailinglist at:
http://groups.yahoo.com/group/baseball-databank/
If you are interested in similar databases for other sports, please
vist the Open Source Sports website at http://OpenSourceSports.com
----------------------------------------------------------------------
1.0 Release Contents
This release of the database can be downloaded in several formats. The
contents of each version are listed below.
MS Access Versions:
lahman2014.mdb
2014readme.txt
SQL version
lahman2043.sql
lahman2014_tables.sql
2014readme.txt
Comma Delimited Version:
2014readme.txt
AllStarFull.csv
Appearances.csv
AwardsManagers.csv
AwardsPlayers.csv
AwardsShareManagers.csv
AwardsSharePlayers.csv
Batting.csv
BattingPost.csv
CollegePlaying.csv
Fielding.csv
FieldingOF.csv
FieldingPost.csv
HallOfFame.csv
Managers.csv
ManagersHalf.csv
Master.csv
Pitching.csv
PitchingPost.csv
Salaries.csv
Schools.csv
SeriesPost.csv
Teams.csv
TeamsFranchises.csv
TeamsHalf.csv
----------------------------------------------------------------------
1.1 Introduction
This database contains pitching, hitting, and fielding statistics for
Major League Baseball from 1871 through 2014. It includes data from
the two current leagues (American and National), the four other "major"
leagues (American Association, Union Association, Players League, and
Federal League), and the National Association of 1871-1875.
This database was created by Sean Lahman, who pioneered the effort to
make baseball statistics freely available to the general public. What
started as a one man effort in 1994 has grown tremendously, and now a
team of researchers have collected their efforts to make this the
largest and most accurate source for baseball statistics available
anywhere. (See Acknowledgements below for a list of the key
contributors to this project.)
None of what we have done would have been possible without the
pioneering work of Hy Turkin, S.C. Thompson, David Neft, and Pete
Palmer (among others). All baseball fans owe a debt of gratitude
to the people who have worked so hard to build the tremendous set
of data that we have today. Our thanks also to the many members of
the Society for American Baseball Research who have helped us over
the years. We strongly urge you to support and join their efforts.
Please vist their website (www.sabr.org).
If you have any problems or find any errors, please let us know. Any
feedback is appreciated
----------------------------------------------------------------------
1.2 What's New in 2014
Player stats have been updated through 2014 season.
Removed two deprecated fields from the batting table. The G_batting and
G_old fields were rendered obsolete when we created the appearances table.
They've beenremoved from the batting table starting with this version
SchoolsPlayers has been replaced with a new table called CollegePlaying.
This reflects advances in the compilation of this data, largely led by
Ted Turocy. The old table reported college attendance for major league
players by listing a start date and end date. The new version has a
separate record for each year that a player attended. This allows
us to better account for players who attended multiple colleges or
skipped a season, as well as to identify teammates.
----------------------------------------------------------------------
1.3 Acknowledgements
Much of the raw data contained in this database comes from the work of
Pete Palmer, the legendary statistician, who has had a hand in most
of the baseball encylopedias published since 1974. He is largely
responsible for bringing the batting, pitching, and fielding data out
of the dark ages and into the computer era. Without him, none of this
would be possible. For more on Pete's work, please read his own
account at: http://sabr.org/cmsfiles/PalmerDatabaseHistory.pdf
Three people have been key contributors to the work that followed, first
by taking the raw data and creating a relational database, and later
by extending the database to make it more accesible to researchers.
Sean Lahman laun