State Traces

Climate metadata, 1789-c.1860

CLIWOC vessels (European, 1750-1850)

Reconstructed Shipping Times

The expansion of whaling

Deck 701, US Maury Collection (1789-c.1865)

Deck 892, US shipping 1980-1997

Logbook Digitization in the 1920s

Wallbrink, H. and F.B. Koek, Data Acquisition And Keypunching Codes For Marine Meteorological Observations At The Royal Netherlands Meteorological Institute, 1854–1968

Digitized logbooks, c. 1930

Wallbrink and Koek

ICOADS Deck 720, German weather data, 1876-1914

ICOADS Deck 735, Soviet Research Vessels

Closeup of Deck 735. Soviet Vessels near the coast of South America.

ICOADS Deck 735, Russian Research Vessel (R/V) Digitisation

German Deep Drifter Data (via ISDM; originally from IfM/Univ. Kiel)


Organization of knowledge

Classifying Knowledge

Bacon's classification

Jefferson's Classification

Library of Congress, 1898

Organization of cataloging at the Library of Congress, 1909

Organization of cataloging at the Library of Congress, 1909



Ngrams: 3 2044

3 2044

Google's partner libraries

Words the drop off between 1918-1922 and 1923-1927

Bookworm Hathi-Trust

Some corpora in bookworms already:

  1. Open Library (1 million books)
  2. Hathi Trust (4 million books)
  3. Medical Heritage Library (250,000 thousand medical books)
  4. Chronicling America (6 million newspaper pages, USA 1850-1922)
  5. Index Catalog (4 million medical articles, 1500-1950, titles only)
  6. (c. 600,000 science article, 1995-present)

An example narrative: British parliament

"Nearest Neighbor" book searching

1,000,000 books

Some questions:

  • How do classifications shape what gets written?
  • How do state classifications (Library of Congress) inform and disagree with "academic" classifications (eg., JSTOR)
  • How can machine classification help us to understand classification?
  • What does government classification have to do with government security classification?