When a batter steps into the batter’s box, what odds does he have of contributing something, anything, to his team’s run probability?
Kind of a weird question. But it’s interesting to look at. I looked at Tangotiger’s Run Probability values, sorted by base-out state (http://www.tangotiger.net/re24.html) and I tried to use them to find something useful.
It’s not a great name, but I’m calling it eRPA–Expected Run Probability Added. It’s not really accurate. Using the equation (OBE)(O)(TM)/PA^2 is how I came to my values. Basically, this multiplies your on-base ratio (not OBP, doesn’t include sacrifice flies in the denominator) times your out ratio, times the Tango Matrix number for a given base-out state. Then I took the average of all 24 values to give a singular value. I’ll discuss why this needs a LOT of work at the end.
Why no sac flies? Because without any data, I’m pretty sure that a ridiculously low numbers of total fly-outs are sac flies. If you have more info on that send me a link, I’d love to see that data.
Anyway, the equation’s pretty general. But the values we get from that are interesting.
As of August 9th (yes, I know, that was almost 2 months ago, but Baseball-Reference subscriptions aren’t free and don’t last forever), the leader in eRPA was Jose Altuve, with 11.89%. Second was Aaron Judge, with 11.84%. The rest of the top 10, in order, were Paul Goldschmidt, Joey Votto, Daniel Murphy, Joey Votto, Domingo Santana, DJ LeMahieu, Anthony Rizzo, and Kris Bryant.
Bottom 10: Danny Espinosa, 8.03%, Ryan Goins, 8.56%, Jorge Polanco, Austin Hedges, Alcides Escobar, Trevor Plouffe, Tim Anderson, Roughned Odor, Matt Davidson, Jose Iglesias.
So next question. What do these numbers mean? And can we take anything away from them?
My honest answer is that I’m not sure.
Judge took the league by storm; he and Altuve are probably the top-2 MVP choices this year. Goldschmidt belongs in MVP discussion, Votto has an insane all-time OBP, and Bryant, even in a “down year” (meaning only 60 RBIs, which doesn’t mean much) ranks 10. So on a basic level, the players we seem to consider as top-tier are part of the top tier of eRPA.
Leading the pack in career eRPA are Ted Williams, with 11.83%, and Barry Bonds, with 11.71%. Right after him comes this guy named…Ferris Fain? I’d never heard of him. But he had a lifetime OBP of .424 and averaged almost 130 walks for his 8-year career. So not a bad offensive player. Also coming in the top-10 of career eRPA are Babe Ruth (with the caveat that I was only able to use the stats from Baseball-Reference), Manny Ramirez, and Frank Thomas; Joey Votto makes it in there, as does Jeff Bagwell, and Rickey Henderson. So again, these are all legendarily-good hitters.
So we can pretty reasonably assume, without any data, that this correlates pretty well with oWAR; eRPA is an offensive stat, but like BABIP, it can be used for pitchers, too. I think. Haven’t tested a kind of “defensive eRPA” out yet.
To test this theory for oWAR, I took all my 1,800+ career values of eRPA, put it on the x-axis; and took each player’s respective oWAR/PA (to put them both on the same scale) and took the r^2 value. I got .45. Not too great.
But given this eye-test correlation between eRPA and “good hitting”, it’s kind of weird that we only get a .45 correlation between eRPA and oWAR. I would think at least .6 or .7.
Quick note: I used Baseball-Reference’s stats for all of these calculations.
Does it benefit contact, or high-walk, or power hitters? Take a look at the top-10 again. We’ve got 50-dongs Judge, .414 OBP Altuve, and 130-walks Joey Votto. So I think it’s pretty fair to say (again, without any data to back it up) that eRPA is a catchall stat. Similar to oWAR.
The next step, I think, would be to sort by WPA/PA. I still don’t know how much of a difference that would make but .45 is a good start. That said, this is still pretty early in development and I think that more changes to the formula are necessary before a new comparison. Alternatively (if I can find it), wOBA, wRC, or xwOBA would be good values to plot.
How can eRPA be improved? There’s a couple things I can think of right now.
Step 1, I think, would be to break each player’s season or career down by plate appearances, in terms of base-out state. Then I’d use the statistics from that specific base-out state to calculate a value for that specific state. And then, to create a more accurate picture, I’d weight all 24 values relative to the total plate appearances. That would probably improve the stat. No clue by how much though; this seems like a fine start. Only problem with this is that it turns eRPA into MUCH more of a descriptive statistic (at the moment it’s kind of a combination of descriptive and predictive), and also limits eRPA to a base-out based stat, while currently, it’s much more general.
I guess I could also use better statistics in the formula. Instead of OBE, maybe I could take away HBPs. Off the top of my head, I don’t know if skill at getting hit by pitches translates to a higher OBP at minimum–they’re so unlikely that I can’t imagine that it impacts it too much. I originally had RBI in the equation for eRPA but took that out, so I can see taking HBP out as well.
It’s also totally possible that my formula as a whole is messed up. It’s not just a rate statistic–it’s a rate statistic with a squared term on the bottom. I don’t know how to improve the formula–but trial and error is a great way to start.