Baseline cherrypicker

Baseball edition

Statements of the form "Jack Morris won more games in the 1980s than anyone else" are fascinating. Although they're true, they rest on cherry-picked years that may or may not illustrate a deeper truth in context. (And we see them all the time: see my college degrees cherry-picker for another area.) For baseball, there are thousands of statements just like the ones here that you can make about any single cumulative stat over the game's history--10,296, to be exact. Printed out, all the statements you could make with the data here (which now includes individual franchise and league leaderboards) would take about 120,000 pages, single-spaced. This visualization lets you hone in on the patches of interest.

If you just run your mouse over the chart and read the text that pops up, you'll start to get the general idea. (On a phone or tablet, try tapping.) Or see below for a fuller explanation.
Longer explanation: the x axis shows the starting year for any stat: the y axis shows the length of time being measured. So, for example, if you go down 7 cells from "1940," it will look up the player who led the league in WAR for the 7 years following 1940, and show the sentence "Ted Williams led the majors with 48.28 WAR from 1940 to 1947."

Sometimes you might want, instead, to know who led the league in the 7 years BEFORE 1947. Click on the word "start year" just above the graph to toggle the display so that shows all the periods that end in a certain year (which makes it easier to see who has the most hits over the last 5 years, 10 years, and so forth). If you look only at the top row, you'll see the annual leaders; if you look only at the leftmost column, you'll see the equivalent of a progressive leaderboard; and if you've toggled, the rightmost column will show the leaders as you go farther back into the past. In between are all the other sequential periods: the leaders by decade, or presidential administration, and so forth. What I'm interested in is how the emergence of larger regions helps us contextualize those claims I was talking about at the start. A couple baseball notes, here. My impetus here was to show the arbitrariness of that claim about Jack Morris in the 1980s, which has been ubiquitous in discussions of his hall of fame case. For that purpose, this chart is a failure. In fact, I think it shows that Morris's win totals were actually quite impressive: 1980-1989 is far from the only timespan in which Morris led the league in wins. (For instance: Jack Morris led the majors with 254 wins between 1975 and 1999). Not that pitcher wins are a great stat--but Morris's dominance in them matches up with hall of famers like Juan Marichal, Burleigh Grimes, and Hal Newhouser, and looks more impressive than the winningest pitcher of the 2000s, Andy Pettitte.

A couple possibilities for expansion suggest themselves. I could include leaders by position; somehow bring in a third dimension to show the near-leaders in each period; or break it down to include periods shorter than a year ("David Ortiz led the majors with 12 home runs between April 3 and August 13"). Visually, I think the last would be the most interesting, since it would cut out the blocks and give truly smooth lines: but it would require a whole new algorithm to calculate times.
The team stats are interesting (for instance, it never occurred to me that Pittsburgh has had far more than its share of triples hitters), and can pretty illuminating for long-term franchise histories.