{/exp:tag:subscribed}John Thorn: Fame & Fandom

(9 -*11:09am, Sep 21)*

Last: Bunny Vincennes

An interview with the ‘Hit King,’ Pete Rose

(16 -*8:07am, Sep 05)*

Last: Jeltzandini

McCoy: Bryan Price sees throwback style in current state of baseball

(8 -*11:42pm, Aug 28)*

Last: the Hugh Jorgan returns

Malloy: Out at Home

(1 -*4:13pm, Aug 19)*

Last: Rennie's Tenet

Jerry Lumpe Dies

(5 -*11:52am, Aug 18)*

Last: Steve Treder

Matinale: Nolan Ryan, power pitcher into his 40s. Did he use steroids?

(94 -*2:23am, Aug 14)*

Last: baxter

Twins Triple-A team completes the oddest no-hitter you'll see this season

(7 -*5:35pm, Aug 12)*

Last: Chris Fluit

Left-Handed and Left Out

(58 -*5:16pm, Aug 11)*

Last: Greg K

Baer: Caleb Joseph has homered in five consecutive games

(12 -*11:32pm, Aug 10)*

Last: Ziggy

Bartolo Colon earns 200th career win

(26 -*11:15am, Aug 10)*

Last: McCoy

Biogenesis Scandal implicates more players.

(16 -*5:50pm, Aug 06)*

Last: valuearbitrageur

Difference of opinion on baseball stats as Derek Jeter climbs all-time hits list

(19 -*12:23pm, Aug 02)*

Last: Slivers of Maranville descends into chaos (SdeB)

Sickels: George McClellan, Dayton Moore, and the Kansas City Royals

(631 -*5:54pm, Jul 29)*

Last: McCoy

5 for Friday: Leo Mazzone, pitching coach to the HOFers

(34 -*8:26pm, Jul 26)*

Last: Walt Davis

Goldman: Eliminating the shift a bandage for a phantom wound

(39 -*2:02pm, Jul 25)*

Last: Sunday silence

Royals encounter problem with online sale of playoff tickets

(21 -*12:01pm, Sep 21)*

Last: SoSHially Unacceptable

Athletics out of top wild-card spot, Texas sweeps

(11 -*11:57am, Sep 21)*

Last: McCoy

OT: Politics, September, 2014: ESPN honors Daily Worker sports editor Lester Rodney

(3403 -*11:57am, Sep 21)*

Last: Greg K

Lindbergh: Where Dellin Betances’s Season Ranks Historically, and What It Teaches Us About Bullpen Strategy

(1 -*11:48am, Sep 21)*

Last: jdennis

OT: The Soccer Thread, September 2014

(335 -*11:46am, Sep 21)*

Last: Juilin Sandar to Conkling Speedwell (Arjun)

Cameron: The Stealth MVP Candidacy of Hunter Pence

(5 -*11:42am, Sep 21)*

Last: jdennis

OMNICHATTER 9-21-2014

(4 -*11:22am, Sep 21)*

Last: BDC

John Thorn: Fame & Fandom

(9 -*11:09am, Sep 21)*

Last: Bunny Vincennes

Lindbergh: Dellin Betances’s Season & Bullpen Strategy

(5 -*10:50am, Sep 21)*

Last: SoSHially Unacceptable

OT: September 2014 College Football thread

(318 -*10:42am, Sep 21)*

Last: BDC

HBT: Talking head says Jeter is “a fraud” and “you are all suckers”

(87 -*10:38am, Sep 21)*

Last: A big pile of nonsense (gef the talking mongoose)

OT: Monthly NBA Thread - September 2014

(295 -*7:40am, Sep 21)*

Last: Mirabelli Dictu (Chris McClinch)

OMNICHATTER 9-20-2014

(92 -*2:49am, Sep 21)*

Last: Bunny Vincennes

En Banc Court May Call Foul on Bonds Conviction

(38 -*12:41am, Sep 21)*

Last: David Nieporent (now, with children)

Esquire: Martone: The Death of Derek Jeter

(312 -*9:20pm, Sep 20)*

Last: Omineca Greg

## Reader Comments and Retorts

Page 2 of 2 pages

2It's in the player value tab, just like for every other pitcher.

Heh. I asked Bill James this question at the SABR convention in 1985. He said he'd tried to work it out, but was unable to. Still seems like there should be something there.

One thing you can do is use plate appearances to estimate innings, but I reject that because it works really poorly considering defensive replacements and strategic pinch hitting. I prefer to use defensive chances to estimate innings. But that runs into problems, say the guy who played 10 games had a lot of chances, and the system says he had 13 equivalent games. That's not possible, so I cap him at 10, and then do another iteration for the rest of the players. I never published the estimate innings because I don't trust them at all. It's just a stand in and I hope some day retrosheet can give us more accurate actual innings played (they've done this for much of the 1940's, though some games are missing).For quite a few seasons back (further than the PBP years), Retrosheet has numbers for games, games started, and complete games at a given position. I played around with turning those numbers into innings estimates a while back, albeit over a pretty small number of years. It worked out to be approximately:

Innings played (as a fraction of team innings) = {2/9*(games played but not started) + 7/9*(games started but not completed) + 1*(complete games)}/(team games)

There was some slight variation in the coefficients across positions, but not more than a couple of percentage points, if memory serves.

I understand vaguely that LF is less scarce than CF; so that's OK. What I'm not getting is why there is anything other than 0 for the team value here. Since every team plays all 9 positions equally why should there be any adjustment needed for positional scarcity.

I am looking at the 1933 NYG right now, they get 48 RUns here for RPos. Why? It's not like the NYG were playing CF more than any other team.

So I guess it's that they had better players in key defensive positions.???

Under the team stats>Team Player Value Batters they have Rbat; Rbaser; Rfield; and Rpos; I add all of those to get Wins above average for positional guys (WAA)

ANd then under Team Player Value for the Pitchers I look up WAA for pitchers.

So if add them both together shouldnt I get something close to their wins, or more likely their Pythagorean wins?

For NL pennant winners in the 1930s, this is very close or dead on for six of those teams. For the other four it seems a bit off.

THe four that seem a bit off:

Year...Team...actual wins...pyth wins...WAA

1930 STL 92 94 11.5

1931 STL 101 97 11.6

1935 CHI 100 101 19.0

1939 CIN 97 95 13.9

for an average prediction error of 5.75 games. all four teams did play 154 games.

That seems too much of error for this type of system, maybe I am missing a park factor but I think that has already been taken into account.

For the other six teams, I think the method works very well though, but here....

Also why is Rbat so low? I think for almost every team but the Cubs it is in negative territory what is going on with that?

Pitchers.

Again, pitchers.

The league average that everyone is compared to is batting with pitchers excluded. That is important after 1973 to make sure Cal Ripken and Mike Schmidt are on equal footing. But then I end up with pitchers being some big negative number for rbat. It's balanced out by crediting pitchers in the rpos column.

I used to have the batting portion reconcile to actual runs scored. That is no longer used on the site, at some point Sean Forman switched to Pete Palmer's batting runs. So I'm guessing those teams that are off scored more runs than would be expected from their offensive inputs.

QUESTION: Are the Rfield factors based on the Total zone method or not? I think I've asked this a couple times and just not sure where this has been answered. This is doubly confusing because in the last post AROM says that they are using Pete Palmer's batting runs, which is fine, I suppose because the problem I am having is with the defensive numbers. So just to be clear. It is Sean Smith's Total Zone method that is used for the Rfield. Yes?

It looks very much like there is a systemic error in the relationship between the defensive runs that Rfield is assigning to the various fielders. It is not just random but rather at the high end of the curve it is pushing defensive numbers down and at the low end, it is pushing them up. So great fielders are being diminished and bad fielders are being helped up. More or less the bell curve is being pushed in at both ends.

I am doing this by hand, so I'll post what I come up with in a day or two.

Are the Rfield factors based on the Total zone method or not? I think I've asked this a couple times and just not sure where this has been answered. This is doubly confusing because in the last post AROM says that they are using Pete Palmer's batting runs, which is fine, I suppose because the problem I am having is with the defensive numbers. So just to be clear. It is Sean Smith's Total Zone method that is used for the Rfield. Yes?TotalZone is used for the seasons in which we have play-by-play data but not video-based numbers, so roughly from the mid-'50s to the mid-2000's. Before the PBP data starts, the numbers are an adjusted range factor and will be considerably less reliable, which is probably why they're regressed to the mean more heavily.

Teddy is on Mt. Rushmore basically because if it wasn't for Teddy, there wouldn't be a Mt. Rushmore. So that analogy isn't always going to work for the "Mt. Rushmore of ..." but I suppose the Mt. Rushmore of shortstops should have whoever it was who figured out that putting an infielder between 2B and 3B was a good idea.

A related issue: has anyone done any comparison of Defensive efficiency vs "luck" (+ or - in actual wins vs pythagorean runs)?

Looking at the 1930s and then briefly at the millenium, I think there might be some relationship. Not 1:1 but there seem to be some interesting numbers there.

KEY for each column: Year/Team/Pythag Wins/Total Wins Above average/Difference between Pythag and WAA valu

NOTE: under pythag wins if team played less than 154 games this is noted as "less X games", I then added one or two

wins to the total wins to account for this e.g instead if 2 less games, I might add one to go from 89 to 90 wins. No attempt was made to turn these into fractions. Total WAA is found by adding: Rfield, Rbaser, Rbat, and pitching WAA for each team. Under the total: positive numbers means the team did better than the value suggests; and - means their pythag record is worse than the WAA value suggests.

1st place teams year by year, by pythag wins

39 CIN 95 13.9 4.1

38 CHI 88 (less 2) 11.5 0.5

37 NYG 89 (less 2) 12.4 0.6

36 CHI 93 14.7 1.3

35 CHI 101 19.0 5

34 NYG 95 less 1 13.7 5.3

33 NYG 90 less 2 11.6 2.4

32 CHI 86 7.6 1.4

31 STL 97 11.6 8.4

30 STL 94 11.5 5.5

avg error 4.8 games per team/season

I.e. this is how much the team out performed (by pythag wins) the prediction via the metric

2nd place teams

30 BRO 89 14.6 -2.6

31 NYG 93 (less 2) 13.7 3.3

32 NYG 82 -1.6 6.6

33 CHI 90 12.9 0.1

34 STL 90 less 1 12 2

35 STL 96 11.5 7.5

36 NYG 89 11.8 0.2

37 CHI 89 12.4 -0.4

38 CIN 84 (less 4) 8.6 0.4

39 STL 91 (1 less) 16.7 -1.7

avg error: 1.5 games...

last place teams

30 CIN 59 -8.2 -10

31 BOS 60 -8.4 -8.6

32 CIN 62 -3.3 -11.7

33 CIN 58 (2 less) -11.7 -6.3

34 CIN 55 (3 less) -15 -6

35 BOS 50 (1 less) -20.6 -6.4

36 PHI 64 -6.6 -6.4

37 BRO 61 less 1 -11.4 -5.6

38 phi 47 (less 4) 23.7 -5.3

39 PHI 47 (less 3) 23.8 -5,2

avg error -7.1 per team/season; i.e. they performed worse than the metric suggests

next to last place

30 PHI 60 -21 +4

31 CIN 61 -10.8 -5

32 STL 74 -3.6 0.6

33 PHI 61 less 2 -12.1 -2.9

34 PHI 64 less 5 -6.6 -4.4

35 PHI 60 less 1 -13.7 -3.3

36 BRO/BOS 68 -3.9 -5.1 (average of both teams)

37 PHIL 64 less 1 -12.4 -0.6

38 BOS 69 (less 2) -1.5 -5.5

39 BOS 66 (less 3) -5.9 -1.1

avg -2.3

This is Sean's response to my question in 155 why is Rbat in negative territory for nearly every team.

With all due respect to Sean, this can't be entirely what is going on because the Cub numbers are too far out of whack. Here are the Cubs Rbat for the 1930s:

Year/Rbat

30 44

31 62

32 -48

33 4

34 -40

35 27

36 -21

37 44

38 -41

39 -54

Keep in mind the league is about -40 (going from memory and didnt check all) whereas the Cubs here have 4 seasons where they are like 70 to 100 runs above the league average. How can that possibly be coming from pitcher slot? They only bat 1/9 times, so if the team averaged say 80 runs above the league Rbat, then I guess every pitcher would have to be like 9 runs better than the average MLB hitter?? It would be off the charts...

SOmething else is going on here.

Unless you look at the entire league, you can't really say that what Sean said is wrong. I'm not sure what you are trying to say here, but the sum of rbat for a season in the 30's for all the teams routinely finished around 800-900. Roughly on average -50 runs per team. Positive rBat is coming from the other slots, not the pitching slot.

I am referring to the team Rbat; it appears to be under the Team Batting Value or some such. YOu yourself state that the average team Rbat for the 1930s is -50. The cubs routinely seem to be 70-100 pts better in half those seasons. I do not understand how that is possible since the pitching staff can only provide 1/9 ABs, it seems there is not enuf AB to create so large a differential.

Sean had previously stated that the large negative was due to comparing team Rbat (which includes pitchers AB) to position player AB and that was the reason.

If you still dont get it, ask again I will try to explain, I thought I was clear.

Unless maybe I am looking at the wrong category?? Was I looking at the pitchers RBat only? But I dont think so. You seem to agree that the average Rbat for the entire team is -50 so we seem to agree....

He set up a standard across the league that compares all(non-pitcher) hitters to league average. This is the baseline he uses for rbat. If you remove every pitcher's at bat from the team totals for team rbat, you would end up with a league wide score of right around zero. (rounding issues, and other things could play into it not being exactly zero) then he figures out pitchers hitting relative to league average. Obviously they are going to be pretty poor and create a negative number.

On average they seem to create around -50 runs over the course of the season below a league average hitter. This doesn't seem unreasonable.

Look at a league average hitter, which would be someone with around 0 rbat, and compare to an elite level hitter and you get around 50-100 rbat. It's not unreasonable at all to think that a pitcher is 50 runs worse than a league average hitter over 154 games.

http://www.baseball-reference.com/teams/CHC/1935.shtml

At the bottom is Team Player Value Batters; that includes all the batters Phil Cavaretta, Tex Carleton, CUyler, everyone not just pitchers. THere is a total for everyone (I presume thats why its at the bottom of the colum) scored as 35. This value fluctuates wildly for the Cubs from year to year I dont get it.

I dont know what I am missing, I am confused again...

Because some years they have very good hitters and some they don't? Why would it be stable?

Give me a specific season that you are questioning. The rbat is based upon league average hitter(with pitchers offense removed) it's not based upon team level, but the entire league so some teams, will have positive numbers if they are very good offensive teams.

I didnt mess up the calc in post 166 then so good.

1. Remove all pitchers plate appearances from the data

2. Figure out what "average" is.

3. Create a formula that rates average on a runs scale where average equals 0.

4. Apply that formula to each individual player to create a run value relative to average of zero. At this point in time they also apply it to individual pitchers.

This is why on the overall team level the rbat of all players add up to negative numbers. But it doesn't necessarily have to. The 1935 Cubs led the league in scoring so you would expect that they would have a plus team rbat.

Click here for 1935 league data for player value to see how all the teams do on rbat.

