Wednesday, January 28, 2009

Sabermetrics

Bill James, of the Society for American Baseball Research (SABR), is known as the founder of sabermetrics. James defined sabermetrics as “the search for objective knowledge about baseball.” In laymen’s terms, sabermetrics are a more complicated version of statistics that have more meaning.

But do sabermetrics really have more significance than regular statistics?

Let’s start off with double plays. The amount of double plays a batter has is very deceiving. Double-play opportunities (DP_OPPS) actually reveal more than just double plays themselves. These opportunities include at-bats with runner(s) on first, first and second, first and third, or first, second, and third. Double Play Percentage (DP %) is the sabremetric statistic that tells the amount of double play opportunities that were converted into double plays.

Let’s compare Padres’ first baseman Adrian Gonzalez and Royals’ designated hitter Billy Butler. Gonzalez grounded into 28 double plays in 2008 while Butler hit into 24. The typical person would say Gonzalez hit into double plays more often, but that person would be wrong. Gonzalez hit into double plays 16.3% of the time. Butler, on the other hand, hit into double plays 25.3% of the time.

Clearly, sabermetric statistics are an advantage over regular statistics, but how useful are sabermetrics when used on defense?

Errors and fielding percentage are pure black and white; they do not tell the whole side of one’s fielding ability. That is where sabermetrics step in.

Zone rating (ZR) is a little bit complicated. ZR divides the baseball field into “zones” (where balls are hit in play). The rating is the percentage of balls in play that a player fielded. This is a way of determining a player’s range on the field, but there is another way that involves zone rating.

Ultimate zone rating (UZR) is a statistic that compares one fielder to the league-average fielder at the position. The comparison is by how many runs a player prevented compared to that of a league-average player.

One of the best statistical ways to determine a fielder’s range by his fielding attempts is using range factor (RF). Adrian Gonzalez, 2008 gold glove winner, and Cardinals’ Albert Pujols will be evaluated for this. The first basemen had identical fielding percentages (.996), but Pujols’ range factor was listed at 10.61, leading all first basemen. Gonzalez’s RF was 9.12, fifteenth in baseball in 2008. By that statistic, it is a mystery why Gonzalez won the gold glove award.

Fielding runs above average (FRAA) and fielding runs above replacement (FRAR) both determine the number of runs a fielder has not permitted. FRAA compares one fielder to an average fielder whereas FRAR compares one fielder to a replacement player.

Sabermetrics can help determine a hitter’s true power numbers. The normal statistic is slugging percentage (SLG) and the sabermetic statistic is isolated power (ISO). The difference between SLG and ISO is ISO is made up by only extra-base hits (XBH) whereas SLG includes singles.

This time, Braves’ third baseman Chipper Jones and Phillies’ first baseman Ryan Howard will be evaluated. Chipper Jones was ranked fifth in baseball in SLG (.574) and Howard ranked seventeenth (.543). ISO ranks them much differently. Howard’s ISO was tied for fifth in baseball in 2008 (.292), a very high number for ISO. Chipper Jones was not ranked in the top 25, or the top 50 for that matter. Jones was tied for 74th in baseball in ISO (.210). Jones benefitted from a lot of singles. In contrast, Howard hit more homeruns (48) than Jones had XBH (47). Howard hit 78 XBH over the course of the season.

Runs batted in (RBI) are considered one of the most overrated statistics in sports. Others batted in (OBI) is very similar to RBI, except it does not include the batter driving himself in by a homerun. OBI is the sabermetric stat while RBI is the everyday stat.

Ryan Howard will be the test subject again, but this time he will go up against Twins’ first baseman Justin Morneau. Howard’s RBI total was 146 in 2008 and Morneau’s was 129. Their OBI total was closer. Howard totaled 98 OBI, but Morneau totaled 106. 67.1% of Howard’s RBI total was by OBI, meaning he drove himself in 32.9% of the time. Morneau drove himself in 17.8% of the time. In other words, 82.2% of Morneau’s RBI total was via OBI.

There is batting average (BA or AVG) and there is batting average on balls in play (BABIP). BABIP helps determine the amount of luck a pitcher has by his defense. For example, Orioles’ outfielder Nick Markakis suffered from bad luck in the 2008 season. Markakis’ batting average was .306, above average. However, his BABIP would have been .350, a significant difference.

An offensive stat to measure a player’s offensive value per out is called equivalent average (EqA). EqA is averaged out by the difficulty of the league, the park, the opposing pitching, hitting, and base running. EqA results are expressed using the “Stars and Scrubs Chart.” The chart labels each player by a category by their EqA. A player with an EqA less than .230 would be considered a scrub, between .250 and .280 a regular, between .280 and .300 a star, and anything above .300 a superstar. Granted, the chart does have flaws, but it gives a general idea of how the average hitter performed. Red Sox 2B Dustin Pedroia would be ranked a star (.298 EqA) whereas teammates JD Drew and Kevin Youkilis would both be ranked superstars (.314 and .313 respectively).

Now, let’s do some pitching. Earned run average (ERA) is also a deceiving statistic. Thankfully, sabermetrics have multiple versions of ERA to help measure a pitcher’s skill.

Rays’ pitcher Matt Garza pitched a 3.70 ERA in the 2008 season. Now, that would be considered pretty good, but sabermetrics proves otherwise. Garza’s NRA (normalized runs allowed; runs allowed compared to the league-average pitcher) was listed at 4.00 (average is considered 4.50). Garza’s DERA (defense-adjusted ERA; defense-independent of NRA) was 4.38. Another statistic that does not favor Garza is his fielding-independent pitching (FIP). Garza’s FIP was approximately 4.17.

Diamondbacks’ pitcher Brandon Webb will be the next ERA example. Webb’s ERA in 2008 was 3.30. His QERA (Quik-ERA; ERA based on a pitcher’s groundball rate, walk rate, and strikeout rate) was 3.47. Not bad, but a little bit worse than his regular ERA. Brandon Webb’s component ERA (ERC or CERA) was lower than his regular ERA suggests. Webb’s ERC was 3.04, meaning he performed better than his ERA suggests.

But there are two other stats that are involved with sabermetrics that truly judge a player’s importance: Wins above replacement player (WARP) and value over replacement player (VORP).

WARP values a player’s importance to his team when compared to a replacement player by the amount of wins he is worth to his team. WARP’s values are combined with BRAR (batting runs over replacement), PRAR (pitching runs over replacement; for pitchers), and FRAR (fielding runs over replacement).

But VORP is the real stat that values a player’s individual performance. VORP usefulness is comparing a player’s runs contributed to a replacement player if they both played the same position and had the same amount of plate appearances.

Albert Pujols led all of baseball in VORP in 2008 with a value of 98.6. How good is that? The player with the second-highest VORP, Hanley Ramirez, had a VORP of 78.6.

Speaking of Hanley Ramirez, he finished 11th in the NL MVP voting in 2008. Playing on a small market team that was not in playoff contention did not help his chances at becoming MVP, but he should have been ranked much, much higher. Ramirez was 2nd in baseball in VORP. The real second-place finisher in the MVP race, Ryan Howard, finished 46th in VORP in 2008 (36.4). The voters only looked at Howard’s HR and RBI totals to say that he was MVP-worthy. Milwaukee Brewers’ third baseman, Ryan Braun, finished 3rd in the MVP voting. Braun’s candidacy was that he led the Brewers to the playoffs and that his 37 homeruns were impressive. However, Braun’s VORP was ranked 29th in baseball (15th in the NL).

Regular, everyday statistics are very deceiving. Sabermetrics help define a player’s true value and are an enhanced way of determining talent.

Formulas:
DP% (Double Play Percentage): DP/DP_OPPS
RF (Range Factor): (9*(PO + A))/Innings in field)
SLG (Slugging Percentage): TB/AB
ISO (Isolated Power): (2B + (3B*2) + (HR*3)) / AB
OBI (Others Batted In): RBI-HR
AVG (Batting Average): H/AB
BABIP (Batting Average on Balls in Play): (H-HR)/ (AB-HR-K+SF)
EqA (Equivalent Average): (H+TB+ (1.5*(BB+HBP)) +SB)/ (AB+BB+HBP+CS+ (SB/3))
ERA (Earned Run Average): (ER*9)/IP
FIP (Fielding Independent Pitching): ((HR*13+ (BB+HBP-IBB)*3-K*2)/IP) +3.2
ERC (Component ERA): (((H+BB+HBP)*(.89*(1.255*(H-HR) +4*HR) +.56*(BB+HBP-IBB)))/ (BFP*IP))*9-.56

No comments: