Step 1. Get the Data.
This url has an excellent baseball archive that can be loaded into a MySql database. Once you down load it, unarchive it.
http://www.baseball-...9-11-25.sql.zip
Once it is unarchived it will be called: BDB-sql-2009-11-25.sql
Step 2. Create a Database for putting your data into.
I'll assume you have downloaded a working copy of the MySQL database tools that are right for your computer.
Once you have it all installed, and you have given yourself permission to create a database, type this:
mysql> GRANT ALL ON bbdatabank.* TO 'mikeh'@'localhost' IDENTIFIED BY 'p@ssw0rd';
Obviously use your login name and password.
mysql> CREATE DATABASE bbdatabank; Query OK, 1 row affected (0.00 sec) mysql> quit; Bye
Step 3. Import the Database
Remember the file name BDB-sql-2009-11-25.sql, well you need to also remember where you unarchived it on your hard drive. Cause you are now going to import it. I'll use the user name and password from above to import this data.
at your shell [terminal window], or command file system type:
mysql -u mikeh -p -s bbdatabank < BDB-sql-2009-11-25.sql
{hit Enter} password: P@ssw0rd
This will now load your data.
Step 4. Verify you got everything:
mysql> show tables; +-------------------------+ | Tables_in_bbdatabank | +-------------------------+ | Allstar | | AllstarFull | | Appearances | | AwardsManagers | | AwardsPlayers | | AwardsShareManagers | | AwardsSharePlayers | | Batting | | BattingPost | | Fielding | | FieldingOF | | FieldingPost | | HOFold | | HallOfFame | | Managers | | ManagersHalf | | Master | | Pitching | | PitchingPost | | Salaries | | Schools | | SchoolsPlayers | | SeriesPost | | Teams | | TeamsFranchises | | TeamsHalf | | xref_stats | +---------------------+ 27 rows in set (0.00 sec)
And a quick test to make sure you've loaded the data, run this:
select distinct(yearID) from Teams order by yearID
Your results will look like this, except with all rows:
| 2009 | +--------+ 139 rows in set (0.01 sec)
Step 4. Run a query.
Now let's run a simple query to see what colleges put out the most players making the most money, collectively:
select m.college, count(m.college), sum(s.salary) from Salaries s join Master m on m.playerID = s.playerID and s.yearID between '2002' and '2009' GROUP by m.college order by count(m.college)
Results:
| Cal St. Fullerton | 30 | 117432409.00 | | Stanford | 31 | 114652552.00 | | Arizona State | 34 | 123698040.00 | | Tennessee | 34 | 137204500.00 | | Georgia Tech | 35 | 246225568.00 | | Louisiana State | 37 | 98087290.00 | | UCLA | 41 | 113552707.00 | | Florida | 46 | 87119999.00 | | USC | 54 | 261291719.00 | | None | 651 | 1060121450.00 | | | 3029 | 10591717817.00 | +------------------------------------------+------------------+----------------+ 494 rows in set (0.58 sec)
That should get you going. Here is a quick visualization of this data.

Help






