Jump to content

How to build your own Baseball Stats database

0
  MikeH's Photo
Posted Dec 14 2009 02:54 PM

As a kid, I did not like math until I figured out it was related to sports. As I grew older, I realized that stats were not only related, but closely intertwined - and this was long before MoneyBall. I got the Stats bug early and thankfully it helped me become interested in other forms of math. I still believe that Statistics are the most practical and useful, to me. One way to get your sports-crazed adolescent interested in geeking around with stats is to build a Baseball database and show them how to have fun. Here is a quick primer. For more really cool Hacks, check out Joe Adler's Baseball Hacks.

Step 1. Get the Data.

This url has an excellent baseball archive that can be loaded into a MySql database. Once you down load it, unarchive it.

http://www.baseball-...9-11-25.sql.zip

Once it is unarchived it will be called: BDB-sql-2009-11-25.sql

Step 2. Create a Database for putting your data into.

I'll assume you have downloaded a working copy of the MySQL database tools that are right for your computer.

Once you have it all installed, and you have given yourself permission to create a database, type this:

mysql> GRANT ALL ON bbdatabank.* TO 'mikeh'@'localhost' IDENTIFIED BY 'p@ssw0rd';


Obviously use your login name and password.

mysql> CREATE DATABASE bbdatabank;
Query OK, 1 row affected (0.00 sec)

mysql> quit;
Bye


Step 3. Import the Database

Remember the file name BDB-sql-2009-11-25.sql, well you need to also remember where you unarchived it on your hard drive. Cause you are now going to import it. I'll use the user name and password from above to import this data.

at your shell [terminal window], or command file system type:
mysql -u mikeh -p -s bbdatabank < BDB-sql-2009-11-25.sql 
{hit Enter} password: P@ssw0rd


This will now load your data.

Step 4. Verify you got everything:
mysql> show tables;
+-------------------------+
| Tables_in_bbdatabank |
+-------------------------+
| Allstar 	| 
| AllstarFull 	| 
| Appearances 	| 
| AwardsManagers 	| 
| AwardsPlayers 	| 
| AwardsShareManagers | 
| AwardsSharePlayers | 
| Batting 	| 
| BattingPost 	| 
| Fielding 	| 
| FieldingOF 	| 
| FieldingPost 	| 
| HOFold 	| 
| HallOfFame 	| 
| Managers 	| 
| ManagersHalf 	| 
| Master 	| 
| Pitching 	| 
| PitchingPost 	| 
| Salaries 	| 
| Schools 	| 
| SchoolsPlayers 	| 
| SeriesPost 	| 
| Teams 	| 
| TeamsFranchises 	| 
| TeamsHalf 	| 
| xref_stats 	| 
+---------------------+
27 rows in set (0.00 sec)


And a quick test to make sure you've loaded the data, run this:

select distinct(yearID) 
from Teams
order by yearID


Your results will look like this, except with all rows:
| 2009 | 
+--------+
139 rows in set (0.01 sec)


Step 4. Run a query.

Now let's run a simple query to see what colleges put out the most players making the most money, collectively:
select m.college, count(m.college), sum(s.salary)
 from Salaries 	s
 join Master 	m
 on m.playerID = s.playerID
 and s.yearID between '2002' and '2009'
GROUP by m.college
order by count(m.college)


Results:
| Cal St. Fullerton 	| 	30 | 117432409.00 | 
| Stanford 	| 	31 | 114652552.00 | 
| Arizona State 	| 	34 | 123698040.00 | 
| Tennessee 	| 	34 | 137204500.00 | 
| Georgia Tech 	| 	35 | 246225568.00 | 
| Louisiana State 	| 	37 |	98087290.00 | 
| UCLA 	| 	41 | 113552707.00 | 
| Florida 	| 	46 |	87119999.00 | 
| USC 	| 	54 | 261291719.00 | 
| None 	| 	651 | 1060121450.00 | 
| 	| 	3029 | 10591717817.00 | 
+------------------------------------------+------------------+----------------+
494 rows in set (0.58 sec)


That should get you going. Here is a quick visualization of this data.

Posted Image

Tags:
0 Subscribe


0 Replies