Humanities Data Analysis

Benjamin MacDonald Schmidt Assistant Professor of History, Northeastern University

May 2015

Humanities Data Analysis

Humanities Data Analysis

Humanities Data Analysis


  1. Humanists approach data from the evidence first
  2. Reading biased sources takes expertise, not algorithms.
  3. Data tells stories about structures.
  4. Expressing meaning rather than discoveries requires new forms.

Humanities Data starts with the evidence.


Successful Humanists

Harvard Cultural Observatory

Data vs Capta

Priestley, Chart of History, 1769

William Playfair, first line charts

Minard, Napoleon's retreat, 1869

A single publisher mentioned in half of all books



3 2044

3 2044

Google's partner libraries

Whaling Logbooks

1848 6 1     3723 29038 02 4    10ISABE*_N   1   5                                                           
165 20779701 69 5 0 1                  FFFFFF77AAAAAAAAAAAA     99 0 790044118480601 
3714N 6937W                                                                           NW     51 NW     57 NW   
51                                          201A.STEWART       NEW BEDFORD             WHALING V
OYAGE           2620 199

Matthew Maury's Wind and Current Charts

CLIWOC vessels (European, 1750-1850)

Woodruff et al, ICOADS

Climate metadata, 1789-c.1860

  1. Filtering
  2. Abstraction
  3. Representation

Matthew Maury

Confederate Navy Engraving 1862, from

Abstract Logbooks

Harvard University Library

Undigitized Elements

New Bedford Whaling Museum

Undigitized Elements

New Bedford Whaling Museum

Undigitized Elements

New Bedford Whaling Museum

Digitized logbooks, c. 1930

Wallbrink and Koek

Logbook Digitization in the 1920s

Wallbrink, H. and F.B. Koek, Data Acquisition And Keypunching Codes For Marine Meteorological Observations At The Royal Netherlands Meteorological Institute, 1854–1968

Reconstructed Shipping Times

Deck 701, US Maury Collection (1789-c.1865)

Deck 701, US Maury Collection (1789-c.1865)

The expansion of whaling

Data shows systems at work

Deck 701, US Maury Collection (1789-c.1865)

Deck 892, US shipping 1980-1997

ICOADS Deck 720, German weather data, 1876-1914

ICOADS Deck 735, Soviet Research Vessels

Closeup of Deck 735. Soviet Vessels near the coast of South America.

ICOADS Deck 735, Russian Research Vessel (R/V) Digitisation

German Deep Drifter Data (via ISDM; originally from IfM/Univ. Kiel)

Live beta at



  • Northeastern University/Rice University Cultural Observatory


  • Erez Lieberman Aiden
  • Neva Cherniavsky, Martin Camacho, Matt Nicklay, Billy Janitsch, JB Michel.

HathiTrust grant Partners

Stephen Downie, Peter Organisciak, Loretta Auvil, Colleen Fallaw, Robert McDonald

Mentions of US Presidents in Newspapers by Year, 1840-1922

Publication places over time of all the public domain volumes in Hathi Trust

"database": "hathipd",
"plotType": "map",
"method": "return_json",
"search_limits": {
    "date_year": {
        "$gte": 1800,
        "$lte": 1922
"projection": "albers",
"aesthetic": {
    "time": "date_year",
    "point": "publication_place_geo",
    "size": "TextCount"


Arbitrarily complex queries and visualizations: newspaper flu coverage, 1917-1919

{"database": "ChronAm",
"plotType": "map",
"method": "return_json",
"search_limits": {"word":["flu","influenza","pneumonia"],
"aesthetic": {
  "point": "placeOfPublication_geo",
  "size": "TextPercent"}}


Seasonality of measles

19th-century whooping cough seems to be a winter disease

Whooping cough's patterns are all between month, not within them

"Croup" in American Newspapers

'Cough syrup in American Newspapers

Exploratory Data Analysis as Interpretation

Garden of Forking Paths

Top 30 queries in the Rate My Professors browser, by slant.

Predominantly female:
mean nice helpful unfair annoying rude caring kind disorganized

Predominantly male:
funny smart sexy brilliant boring intelligent genius interesting cute good

No consistent difference:
hot bossy stupid hard bad teacher ugly fair easy dumb bad attractive

Changing the world

Bonus Tracks

The recent legendaries record whole armies and cities which were at once swept away by the undistinguishing rage of persecution. The more ancient writers content themselves with pouring out a liberal effusion of loose and tragical invectives, without condescending to ascertain the precise number of those persons who were permitted to seal with their blood their belief of the Gospel. From the history of Eusebius it may however be collected that only nine bishops were punished with death; and we are assured, by his particular enumeration of the martyrs of Palestine, that no more than ninety-two Christians were entitled to that honourable appellation.(182) As we are unacquainted with the degree of episcopal zeal and courage which prevailed at that time, it is not in our power to draw any useful inferences from the former of these facts: but the latter may serve to justify a very important and probable conclusion. According to the distribution of Roman provinces, Palestine may be considered as the sixteenth part of the Eastern empire: (183) and since there were some governors who, from a real or affected clemency, had preserved theirs hands unstained with the blood of the faithful,(184) it is reasonable to believe that the Country which had given birth to Christianity produced at least the sixteenth part of the martyrs who suffered death within the dominions of Galerius and Maximin; the whole might consequently amount to about fifteen hundred, a number which, if it is equally divided between the ten years of the persecution, will allow an annual consumption of one hundred and fifty martyrs. Allotting the same proportion to the provinces of Italy, Africa, and perhaps Spain, where, at the end of two or three years, the rigour of the penal laws was either suspended or abolished, the multitude of Christians in the Roman empire, on whom a capital punishment was inflicted by a judicial sentence, will be reduced to somewhat less than two thousand persons. Since it cannot be doubted that the Christians were more numerous, and their enemies more exasperated, in the time of Diocletian than they had ever been in any former persecution, this probable and moderate computation may teach us to estimate the number of primitive saints and martyrs who sacrificed their lives for the important purpose of introducing Christianity into the world.