Baseline cherrypicker

Baseball edition

Statements of the form "Jack Morris won more games in the 1980s than anyone else" are fascinating. Although they're true, they rest on cherry-picked years that may or may not illustrate a deeper truth in context. (And we see them all the time: see my college degrees cherry-picker for another area.) For baseball, there are thousands of statements just like the ones here that you can make about any single stat over the game's history--10,731, to be exact. Printed out, all the statements you could make with the data here would take about 120,000 pages, single-spaced. This visualization lets you hone in on the patches of interest.

If you just run your mouse over the chart and read the text that pops up, you'll start to get the general idea. (On a phone or tablet, try tapping.) Or see below for a fuller explanation.

You can now also click on the x-axis or y-axis to turn the individual cells of just one slice of the chart, horizontal or vertical, into bars that show the size of quantities. This answers questions like "who has the record for most home runs in an 8-year period"?

Show the all-time statistics for by players.

Longer explanation: the x axis shows the starting year for any stat: the y axis shows the length of time being measured. So, for example, if you go down 7 cells from "1940," it will look up the player who led the league in WAR for the 7 years following 1940, and show the sentence "Ted Williams led the majors with 48.28 WAR from 1940 to 1947."

Sometimes you might want, instead, to know who led the league in the 7 years BEFORE 1947. Click on the word "start year" just above the graph to toggle the display so that shows all the periods that end in a certain year (which makes it easier to see who has the most hits over the last 5 years, 10 years, and so forth). If you look only at the top row, you'll see the annual leaders; if you look only at the leftmost column, you'll see the equivalent of a progressive leaderboard; and if you've toggled, the rightmost column will show the leaders as you go farther back into the past. In between are all the other sequential periods: the leaders by decade, or presidential administration, and so forth. What I'm interested in is how the emergence of larger regions helps us contextualize those claims I was talking about at the start. Rate stats are calculated with a cutoff that ranges from 480 AB (batters) or 480 batters faced (pitcher) up to about 4,800 AB over a career, since the hall of fame cutoff is 10 seasons. Since there isn't really any standard I know of for "qualifying AB" over intermediate periods, I interpolate between these using an exponential decay function. So to appear on the charts you must have 480 AB in season 1, 912 PA (or 432 more) over two seasons, 1300 (or 388 more) over 3 seasons, and so forth. For BA and other hitting stats where players don't qualify, I add enough hitless at bats to their totals to make them qualify to create the rankings, just as in normal AB calculations.) A couple baseball notes, here. My impetus here was to show the arbitrariness of that claim about Jack Morris in the 1980s, which has been ubiquitous in discussions of his hall of fame case. For that purpose, this chart is a failure. In fact, I think it shows that Morris's win totals were actually quite impressive: 1980-1989 is far from the only timespan in which Morris led the league in wins. (For instance: Jack Morris led the majors with 254 wins between 1975 and 1999). Not that pitcher wins are a great stat--but Morris's dominance in them matches up with hall of famers like Juan Marichal, Burleigh Grimes, and Hal Newhouser, and looks more impressive than the winningest pitcher of the 2000s, Andy Pettitte.

A couple possibilities for expansion suggest themselves. I could include leaders by position; somehow bring in a third dimension to show the near-leaders in each period; or break it down to include periods shorter than a year ("David Ortiz led the majors with 12 home runs between April 3 and August 13"). Visually, I think the last would be the most interesting, since it would cut out the blocks and give truly smooth lines: but it would require a whole new algorithm to calculate times.
The team stats are interesting (for instance, it never occurred to me that Pittsburgh has had far more than its share of triples hitters), and can pretty illuminating for long-term franchise histories.