Marinomics: A Dismal Baseball Science

Monday, April 17, 2006

 

Power Surges Updated

In case anyone stumbles upon this from a Google search, something Malcolm Gladwell wrote made me update something I wrote a while ago.

This is my updated "Suspicious Seasons" list - players who enjoyed remarkable jumps in power figures (slugging and isolated power) compared to their career, up to that point. This list is a little different - it goes back to 1930. This is the Top 40 sorted on the average of the rank of their Unexpected SLG and Unexpected Isolated Power.

First Last year uxSLG uxIsoP
Barry Bonds 2001 0.296 0.258
Jeff Bagwell 1994 0.286 0.213
Champ Summers 1979 0.296 0.187
Randy Winn 2005 0.269 0.194
Brady Anderson 1996 0.244 0.197
Luis Gonzalez 2001 0.228 0.185
Bob Bailey 1970 0.226 0.191
Jim Spencer 1979 0.215 0.180
Jose Guillen 2003 0.231 0.155
Ken Caminiti 1996 0.218 0.158
Sammy Sosa 2001 0.214 0.159
John Lowenstein 1982 0.227 0.153
Javy Lopez 2003 0.209 0.163
Cito Gaston 1970 0.240 0.145
Jose Hernandez 1995 0.204 0.173
Barry Bonds 2004 0.210 0.146
Ivan Rodriguez 2000 0.202 0.155
Mark McGwire 1998 0.196 0.158
Mark McGwire 1996 0.208 0.148
Jeff Reed 1997 0.203 0.148
Ralph Kiner 1947 0.209 0.142
Felix Mantilla 1964 0.191 0.155
Barry Bonds 2002 0.214 0.136
Alex Ochoa 2000 0.187 0.148
Mark Bellhorn 2002 0.197 0.136
Richard Hidalgo 2000 0.193 0.138
Henry Rodriguez 1996 0.183 0.150
Troy Glaus 2000 0.190 0.141
Kevin Mitchell 1989 0.178 0.155
Gary Sheffield 1992 0.204 0.132
Larry Walker 1997 0.210 0.129
Rico Petrocelli 1969 0.192 0.136
Ray Boone 1953 0.190 0.139
Jhonny Peralta 2005 0.198 0.135
Willie Mays 1954 0.209 0.129
Al Kaline 1955 0.198 0.132
Reggie Jackson 1969 0.182 0.144
Mark McGwire 1995 0.177 0.152
Adrian Beltre 2004 0.201 0.128
Mike Epstein 1969 0.184 0.136

Persons with 2005 seasons in the Top 40:
Randy Winn (Go M's!), Jhonny Peralta
Almost made the list:
Derrek Lee, Tony Clark, Javier Valentin

But of course, this stuff goes the other way, too. The players who had the very worst 2005s, in terms of power outages, compared to their careers:

First Last year uxSLG uxIsoP
Sammy Sosa 2005 -0.168 -0.113
Brian Jordan 2005 -0.125 -0.087
Jeff Davanon 2005 -0.117 -0.085
Mike Lowell 2005 -0.117 -0.076
Bernie Williams 2005 -0.120 -0.069
Edgardo Alfonzo 2005 -0.091 -0.080
Magglio Ordonez 2005 -0.089 -0.084
Kevin Millar 2005 -0.093 -0.073
Ryan Klesko 2005 -0.097 -0.063
Jason Kendall 2005 -0.097 -0.062

Wednesday, June 01, 2005

 

Pitchers and League Switching

Acting on something Smokin' Joe Morgan said during the Dodgers game (Beltre is 'adjusting' to AL pitching) - a followup to the league switching thing I did on hitters. The same thing for pitchers (1950 forward with paired seasons of more than 50 innings pitched - sample ~9000 paired seasons with 900 or so involving league switches).

Equation:
ERA+ = League ERA/Pitcher's ERA
Change in ERA+ from Year 1 to Year 2 = Intercept + B1*([ERA+ in Year1] - 1) + B2*Age + B3*(Switched Leagues)


Coeff StdErr t-Stat P
Intercept -0.070 0.023 -3.067 0.002
ERA+-1 -0.721 0.010 -70.461 0.000
Age 0.004 0.001 5.416 0.000
Switch 0.012 0.011 1.012 0.312

There's even less evidence for a league switching effect here (the t-statistic for the League Switching effect is closer to 0). In fact, switching leagues even has a slightly positive effect. Like before, whatever league switching effect that people may perceive observationally is probably completely overwhelmed by mean reversion. If a pitcher is unbelievably good or bad, you'd be better served by assuming that his performance the next year will just be closer to average, irrespective of what league he's going to pitch in.

Saturday, May 07, 2005

 

More on this season's attendance

I updated the attendance models through Wednesday's game and changed the structure a little bit with a few extra explanatory variables.

Date Opp Actual 04Mod 03Mod 3+4Mod Middle Diff Note
Mon 4/4 Minnesota 46,249 26,544 27,733 27,318 27,318 Opening Day
Tue 4/5 Minnesota 28,373 29,733 26,196 28,958 28,958 (585)
Wed 4/6 Minnesota 25,580 28,064 25,945 27,941 27,941 (2,361)
Fri 4/8 Texas 29,652 37,796 35,936 37,181 37,181 (7,529)
Sat 4/9 Texas 31,501 40,519 37,863 39,466 39,466 (7,965)
Sun 4/10 Texas 30,434 40,734 38,212 39,750 39,750 (9,316)
Wed 4/20 Oakland 24,841 31,523 30,647 30,493 30,647 (5,806)
Thu 4/21 Oakland 22,428 30,103 32,402 31,301 31,301 (8,873)
Fri 4/22 Cleveland 43,207 35,125 31,232 30,497 31,232 +11,975 Ichiro bobblehead
Sat 4/23 Cleveland 33,564 37,848 33,159 32,782 33,159 +405
Sun 4/24 Cleveland 32,889 38,062 33,508 33,066 33,508 (619)
Mon 5/2 LAAngels 24,184 37,278 45,652 39,515 39,515 (15,331)
Tue 5/3 LAAngels 29,917 40,467 44,915 41,154 41,154 (11,237)
Wed 5/4 LAAngels 26,303 38,798 44,664 40,137 40,137 (13,834)

The 2003 version likes the Angels quite a bit, since they were coming off the 2002 World Series.

Monday, April 25, 2005

 

So a jump in power numbers means...

In re: Michael Lewis' NY Times Magazine piece that mentions Beltre's jump...

Players who have enjoyed jumps in various power categories since 1980.

These lists were updated (mostly so Phil Bradley would get on one more, but also because of some stability patterns I saw), to all be the Top 25, Career AB >= 200, Season AB >= 200 (instead of Season AB >= 175). Added Isolated Power and Extra Bases. Added AB numbers to all of the lists to help temper some of them.

Name Year UxTB CrAB SeasAB
Brady Anderson 1996 141.5 3271 579
Barry Bonds 2001 141.1 7456 476
Luis Gonzalez 2001 139 5096 609
Sammy Sosa 2001 123.4 5893 577
Adrian Beltre 2004 120 2864 598
Larry Walker 1997 119.4 3132 568
Ken Caminiti 1996 119.2 3967 546
Robin Yount 1982 118.5 4212 635
Kirby Puckett 1986 118.2 1248 680
Sammy Sosa 1998 114.7 4021 643
Jeff Bagwell 1994 114.2 1675 400
Gary Sheffield 1992 113.7 1110 557
Richard Hidalgo 2000 107.5 656 558
Troy Glaus 2000 107.3 716 563
Ellis Burks 1996 105.8 3720 613
Aramis Ramirez 2001 103.7 561 603
Bret Boone 2001 102.9 3911 623
Ryne Sandberg 1984 101.9 1274 636
Mark McGwire 1998 100 4622 509
Henry Rodriguez 1996 97.6 766 532
Rich Aurilia 2001 97.2 1919 636
Kevin Mitchell 1989 96.9 1311 543
Javy Lopez 2003 95.7 3546 457
Paul Konerko 1999 94.8 224 513
Robin Yount 1980 94.7 3224 611

Name Year UxSLG CarAB SeaAB
Barry Bonds 2001 0.296 7456 476
Jeff Bagwell 1994 0.286 1675 400
Brady Anderson 1996 0.244 3271 579
Terry Shumpert 1999 0.236 884 262
Jose Guillen 2003 0.231 2050 315
Luis Gonzalez 2001 0.228 5096 609
John Lowenstein 1982 0.227 2548 322
Ken Caminiti 1996 0.218 3967 546
Barry Bonds 2002 0.214 7932 403
Sammy Sosa 2001 0.214 5893 577
Barry Bonds 2004 0.21 8725 373
Larry Walker 1997 0.21 3132 568
Javy Lopez 2003 0.209 3546 457
Albert Belle 1994 0.209 1881 412
Mark McGwire 1996 0.208 3659 423
Gary Sheffield 1992 0.204 1110 557
Jose Hernandez 1995 0.204 234 245
Jeff Reed 1997 0.203 2101 256
Ivan Rodriguez 2000 0.202 4443 363
Adrian Beltre 2004 0.201 2864 598
Mark Bellhorn 2002 0.197 323 445
Mark McGwire 1998 0.196 4622 509
Richard Hidalgo 2000 0.193 656 558
Hubie Brooks 1986 0.192 2648 306

Name Year UxIsoP CrAB SeaAB
Barry Bonds 2001 0.258 7456 476
Jeff Bagwell 1994 0.213 1675 400
Brady Anderson 1996 0.197 3271 579
Luis Gonzalez 2001 0.185 5096 609
Jose Hernandez 1995 0.173 234 245
Javy Lopez 2003 0.163 3546 457
Sammy Sosa 2001 0.159 5893 577
Ken Caminiti 1996 0.158 3967 546
Mark McGwire 1998 0.158 4622 509
Kevin Mitchell 1989 0.155 1311 543
Ivan Rodriguez 2000 0.155 4443 363
Jose Guillen 2003 0.155 2050 315
John Lowenstein 1982 0.153 2548 322
Mark McGwire 1995 0.152 3342 317
Pedro Feliz 2003 0.15 373 235
Henry Rodriguez 1996 0.15 766 532
Jeff Reed 1997 0.148 2101 256
Alex Ochoa 2000 0.148 1083 244
Mark McGwire 1996 0.148 3659 423
Barry Bonds 2004 0.146 8725 373
Jim Edmonds 1995 0.143 350 558
Phil Bradley 1985 0.142 389 641
Todd Hundley 1996 0.142 1468 540
Gary Sheffield 1994 0.141 2161 322

Name Year UxExB CrAB SeaAB
Barry Bonds 2001 122.8 7456 476
Brady Anderson 1996 114.2 3271 579
Luis Gonzalez 2001 112.4 5096 609
Kirby Puckett 1986 93.5 1248 680
Sammy Sosa 2001 91.7 5893 577
Phil Bradley 1985 90.7 389 641
Ken Caminiti 1996 86.4 3967 546
Jeff Bagwell 1994 85.2 1675 400
Kevin Mitchell 1989 84.3 1311 543
Robin Yount 1982 82.4 4212 635
Sammy Sosa 1998 82.2 4021 643
Robin Yount 1980 80.8 3224 611
Mark McGwire 1998 80.2 4622 509
Henry Rodriguez 1996 79.8 766 532
Jim Edmonds 1995 79.6 350 558
Troy Glaus 2000 79.4 716 563
Steve Finley 1996 78.4 3364 655
Richard Hidalgo 2000 77.1 656 558
Todd Hundley 1996 76.4 1468 540
Adrian Beltre 2004 76.4 2864 598
Jay bell 1999 76.1 5651 589
Javy Lopez 2003 74.3 3546 457
Sammy Sosa 1999 74 4664 625
Gary Sheffield 1992 73.8 1110 557
Larry Walker 1997 73.1 3132 568


UxSLG = Unexpected Slugging Percentage = SLG in Year X - Career SLG
UxTB = Unexpected Total Bases = TB in Year X - Career SLG * AB in Year X
UxIsoP = Unexpected Isolated Power = IsoP in Year X - Career IsoP
UxExB = Unexpected Extra Bases = TB minus Hits in Year X - Career IsoP * AB in Year X

Make sure you write your Congressman.

Thursday, April 21, 2005

 

How poor has the Mariners attendance been this season?

The spark for this was some notes in the paper about really low attendance figures. I'll compare this season's 6 home dates (7 with opening day) against predictions from models based on the 2004 season, the 2003 season and the 2004 and 2003 seasons together.

Date Opp 2004M 2003M CombM 3MAvg Actual Difference*
Mon 4/4 Min 46,249
Tue 4/5 Min 33,403 28,071 29,342 30,272 28,373 -1,899
Wed 4/6 Min 31,468 27,738 28,080 29,095 25,580 -3,515
Fri 4/8 Tex 38,040 36,658 37,527 37,408 29,652 -7,756
Sat 4/9 Tex 40,783 38,585 39,825 39,731 31,501 -8,230
Sun4/10 Tex 40,998 38,934 40,109 40,014 30,434 -9,580
Wed4/20 Oak 33,123 31,437 32,015 32,192 24,841 -7,351

* - Acutal less 3 Model Avg

(The predicted results for April from the 2004 model are going to be higher because the team's late season attendance was where it nose dived)

Ignoring the opening day result, these figures are remarkably weak - sometimes shockingly so. The -9,580 for the Sunday game against Texas would be in the 1% worst outcomes from the models.

You could come up with a few storylines that have merit for this. Previous year's record makes the public unwilling to show up until a pattern of winning is established - which means that the April results would necessarily be low, but the August and September results could be much higher. The individual teams they are playing are just unattractive crowd draws. Their new park effect is finally gone (after 5 years!). Or you could say that the previous year's levels were just unsustainable.

The 2.5 million that management publicly projected for this year - off from 2.94 million in 2004 and 3.27 in 2003 - would imply an average difference of around minus 6,600 per game from the 3 model average.

Sunday, April 17, 2005

 

The Tin-Foil Hat Brigade

Is it possible to uncover umpire biases? Could their be some grain of truth the beer soaked conspiracy theories that arise when you're at the ballpark - that this ump is a fan of the opponent, or he just hates our team?

Going through game log data back to 1970 - around 70some thousand games - I get the seasonal strikeout and walks per PA rate for teams, pitchers, and umpires. Then I apply these rates to the individual games and get arrive at an excess ("unexplained") strikeouts and walks from the team, umpire and pitcher expectations. However, at least one of these (total team, pitcher or umpire strikeouts) should be close enough to zero that it's statistically insigificant. Why just one?

If there are just a lot of strikeouts to begin with, it could be that the team is merely prone to strikeouts.
If the team is striking out more than we'd expect, then the umpire might just call more strikeouts. As long as he is calling more strikes for everyone, I don't see a problem with this.
If the team is striking out more than we'd expect and there's more strikeouts than we'd expect from the umpire, maybe the pitchers that the team faces just throws a lot of strikeouts.
If we're still left with an unexplained amount of strikeouts and walks after considering the respective proclivities of the team, the umpire and the pitchers - then it begins to look kind of funny.

Then start comparing it to the data from the opponent. Again, if these things are happening to both teams, I guess I just don't care that much. Subtracting out the opponent's excess strikeouts and walks removes the suspiciousness if it is actually happening to both teams, and therefore makes it more noticable if it's only happening to one team.

Then I start adding game after game together. If a particular umpire is actually "for" or "against" a team, this pattern should reveal itself when we add all of these games together and the unusual results from umpires that are merely fair should disappear.

So are there some umpire biases? Well, yes, it is possible to tease this out. Restricting it to a 1% probability level and a reasonable number of games umpired (say X>50), there's 5 umpires whose results with a particular team seem unusual, either for or against a team - though only 1 of them is a current umpire. There were nearly 5,000 umpire/team combinations in total (with game totals ranging from 1 to 175).

I'm not going to "out" any umpire, since I don't think it's really fair. I'm a guy with a few databases, some graduate level stats work and a website, most of these guys were MLB umpires when I was learning long division. Further, there's some things that I don't really like about the approach and I recognize are problems.

The biggest issues are that I want more pitcher data and to have any degree of statistical confidence in the umpire measurements, I need many, many games with the umpire/team combo. To get that, the umpire is necessarily going to have been around a while - and since the 1994 strike, a team gets only around 2 or 3 games with a particular umpire behind the plate each season.

Friday, April 15, 2005

 

Some recent economics working papers and journal articles on baseball

I've been slowly putting together the bones of a paper the past few weeks, so this seems appropriate.

The Effects of Labor Strikes on Consumer Demand: A Re-examination of Major League Baseball: Attendance levels adjusted for new stadiums have not returned to the pre-strike levels.
and
Striking Out? The Economic Impact of Major League Baseball Work Stoppages on Host Communities: The economic impact of baseball is (surprise!) inflated by chambers of commerce. Has 2 different methodologies, both pretty interesting - using the expected deviation in personal income during strike years to measure the economic activity that baseball represents and the ratio of incomebetween MLB and non-MLB cities.

A Fall Classic? Assessing the Economic Impact of the World Series: Similar methodology as the strike paper - the post-season isn't an economic windfall.

Location and attendance in major league baseball: Not free. Builds an attendance model that incorporates the distance between teams and new teams. Has a couple of very interesting implications for the WaNats and Orioles - like decreasing the Orioles attendance by around 125,000.

Reputation Effects in Gold Glove Award Voting: Award models are hard to do. Develops different models for each of the infield positions. Suggests that Gold Glove voters use a player's past GG wins when voting.

Convergence and clustering in major league baseball: the haves and have nots?
Also not free. Uses a clustering algorithm to chop up the teams into "haves" and "have nots" to test some ideas from the Blue Ribbon Panel and some earlier research that contends baseball is more competitively balanced. Over the long haul, they say that the clustering hypothesis works but over the recent past, where revenue figures are available, it doesn't.

And an older one that is a very interesting idea:
Using Baseball Card Prices to Measure Star Quality and Monopsony: Also not free. It uses baseball card prices as a proxy for how the public values a player by based on statistical appeal and "star quality". It uses an attendance model that includes "star quality" - which is the sum of the errors from the card price equation (which also uses a white/non-white ethnic dummy variable, which turns out to be insignificant). I've tried using things like salaries and salaries above a threshold to capture that, but never pulled out the stacks of Beckett that were sitting under my bed as a kid.

Primary Sources

ESPN Baseball | Lahman Database | Baseball Reference | Rules

Secondary Sources: Marinerwise

USS Mariner | Mariner Musings | Dave's Mariners | M's v A's | Evening Perambulations | Caffeinated Confines | Mariners Morsels | Lookout Landing | Mariners Minors | Nice Guys Finish Third | Mariners Blog | Who Wants to Watch the World Series? | Sports and Bremertonians | Mariner Optimist | Trident Fever | Mariners Wheelhouse

Secondary Sources: Otherwise

Rich's BEAT | Sabernomics | Mangement by Baseball | Baseball Think Factory | Baseball Musings | Hardball Times | Sports Economist | Baseball Savant | Aaron Gleeman | Dodger Thoughts | Athetics Nation

Contact

Archives

June 2004   July 2004   August 2004   September 2004   October 2004   November 2004   December 2004   January 2005   February 2005   March 2005   April 2005   May 2005   June 2005   April 2006  

This page is powered by Blogger. Isn't yours?