Bookworm: Exploring and Exposing Digital Texts through Metadata

Benjamin Schmidt
Assistant Professor of History, Northeastern University
Core Faculty, Nulab for Texts, Maps, and Networks

www.benschmidt.org

Bookworm: Exploring Texts through Metadata

(http://bookworm.culturomics.org)

c. 1 million books, 80 billion words

Library metadata via Open Library

Digital Public Library of America funding
Team: Harvard Cultural Observatory, Rice Cultural Observatory, Northeastern University
Martin Camacho * Neva Cherniavsky * Erez Lieberman-Aiden * JB Michel * Billy Janitsch

Guiding philosophy

  1. Digital libraries are places to watch the interaction of metadata.
  2. Metadata is about the text (whatever scale).
  3. Words and phrases are (just?) more metadata.

Grounding Words in Texts

Google Ngrams

http://books.google.com/ngrams

Google Ngrams

http://books.google.com/ngrams

02138

02138

Text Level Indexing

Comparing Custom Corpora

Bookworm Arxiv

600,000 math and physics articles from the last 20 years

arxiv.culturomics.org

Mentions of US Presidents in Ngrams and the Chronicling America Bookworm

Coverage of presidential candidates in the 1896 election by candidate last name.

Coverage of all presidential elections, 1860-1922

Coverage of presidential candidates in the 1872 election

The Bookworm API

  • Specify request using JSON queries

  • Post using http

  • Return data in JSON or TSV

  • http://benschmidt.org/beta/APISandbox

    Extending the platform

    Hathi Trust Bookworm.

    #