Statements of the form "Jack Morris won more games in the 1980s
than anyone else" are fascinating.
they're true, they rest on cherry-picked years that may or may
not illustrate a deeper truth in context. (And we see them all the time: see my college degrees cherry-picker for another area.)
For baseball, there are thousands of statements
just like the ones here that you can make about any single
stat over the game's history--10,731, to be exact.
Printed out, all the statements you could make with the data here
would take about 120,000 pages, single-spaced.
This visualization lets you hone in on the patches of interest.
If you just run your mouse over the chart and read the text that pops up, you'll start to get the general idea. (On a phone or tablet, try tapping.)
Or see below for a fuller explanation.
You can now also click on the x-axis or y-axis to turn the individual cells of just one slice of the chart, horizontal or vertical, into bars that show the size of quantities. This answers questions like "who has the record for most home runs in an 8-year period"?
Show the all-time
Longer explanation: the x axis shows the starting year for
any stat: the y axis shows the length of time being
measured. So, for example, if you go down 7 cells from "1940,"
it will look up the player who led the league in WAR for the 7
years following 1940, and show the sentence "Ted Williams led
the majors with 48.28 WAR from 1940 to 1947."
Sometimes you might want, instead, to know who led the league in the 7 years BEFORE 1947.
Click on the word "start year" just above the
graph to toggle the display so that shows all the
periods that end in a certain year
(which makes it easier to see who has the most hits over the
last 5 years, 10 years, and so forth).
If you look only at the top row, you'll see the annual leaders;
if you look only at the leftmost column, you'll see the
equivalent of a progressive
leaderboard; and if you've toggled, the rightmost column will show the leaders as you go farther back into the past.
In between are all the other sequential periods:
the leaders by decade, or presidential administration, and so
What I'm interested in is how the emergence of larger regions helps us
contextualize those claims I was talking about at the start.
Rate stats are calculated with a cutoff that ranges from 480 AB (batters) or 480
batters faced (pitcher) up to about 4,800 AB over a career, since the hall of
fame cutoff is 10 seasons. Since there isn't really any standard I know of for "qualifying AB"
over intermediate periods, I interpolate between these using an exponential decay function.
So to appear on the charts you must have 480 AB in season 1,
912 PA (or 432 more) over two seasons, 1300 (or 388 more) over 3 seasons, and so forth.
For BA and other hitting stats where players don't qualify, I add enough hitless at bats
to their totals to make them qualify to create the rankings, just as in normal AB calculations.)
A couple baseball notes, here. My impetus here was to show the
arbitrariness of that claim about Jack Morris in the 1980s,
which has been ubiquitous in discussions of his hall of fame
case. For that purpose, this chart is a failure.
In fact, I think it shows that Morris's win totals were actually
quite impressive: 1980-1989 is far from the only timespan in
which Morris led the league in wins.
(For instance: Jack Morris led the majors with 254 wins between
1975 and 1999).
Not that pitcher wins are a great stat--but Morris's dominance
in them matches up with hall of famers like Juan Marichal,
Burleigh Grimes, and Hal Newhouser, and looks more impressive
than the winningest pitcher of the 2000s, Andy Pettitte.
A couple possibilities for expansion suggest themselves.
I could include leaders by position; somehow bring in a third
dimension to show the near-leaders in each period; or break it
down to include periods shorter than a year ("David Ortiz led the
majors with 12 home runs between April 3 and August
13"). Visually, I think the last would be the most interesting,
since it would cut out the blocks and give truly smooth lines:
but it would require a whole new algorithm to calculate times.
The team stats are interesting (for instance, it never occurred
to me that Pittsburgh has had far more than its share of triples hitters), and can pretty illuminating for long-term franchise histories.