Big Data Across the Disciplines, from Cultural Studies to Culturomics




Benjamin MacDonald Schmidt

Fellow, Cultural Observatory @ Harvard

Ph.D. Candidate in History, Princeton University

What to do with millions of texts?

  • Nothing
  • What to do with millions of texts?

    Science

    What to do with millions of texts?

    Hire Programmers

    What to do with millions of texts?

    Focused Reading

    Reading Digital Sources

    1. Technical Competence
    2. Understanding of biases (Source Criticism)
    3. Technique for reading (Hermeneutics)
    4. Argument

    Texts without Authors

    Whatever vision of the digital humanities is proclaimed, it will have little place for the likes of me and for the kind of criticism I practice: a criticism that narrows meaning to the significances designed by an author, a criticism that generalizes from a text as small as half a line, a criticism that insists on the distinction between the true and the false, between what is relevant and what is noise, between what is serious and what is mere play.

    Stanley Fish

    Whaling Logbooks

    Baumann Rare Books
            1848 6 1     3723 29038 02 4    10ISABE*_N   1   5                                                           165 20779701 69 5 0 1                  FFFFFF77AAAAAAAAAAAA     99 0 790044118480601  3714N 6937W                                                                           NW     51 NW     57 NW     51                                          201A.STEWART       NEW BEDFORD             WHALING VOYAGE           2620 199
          

    Logbooks in Abstract

    Harvard University Library

    Logbooks as punchcards

    Wallbrink, H. and F.B. Koek, Data Acquisition And Keypunching Codes For Marine Meteorological Observations At The Royal Netherlands Meteorological Institute, 1854–1968

    Logbooks as punchcards

    Wallbrink, H. and F.B. Koek, Data Acquisition And Keypunching Codes For Marine Meteorological Observations At The Royal Netherlands Meteorological Institute, 1854–1968

    German Merchant Marine Voyages

    Whaling Voyages

    Whaling Voyages

    Undigitized Elements

    New Bedford Whaling Museum

    Undigitized Elements

    New Bedford Whaling Museum

    Undigitized Elements

    New Bedford Whaling Museum

    Whaling Vessel Crews

    New Bedford Whaling Museum

    Physical Descriptions of Whaling Crewmembers

    New Bedford Whaling Museum

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    02138

    02138

    Library Origins of Bookworm Volumes

    Bookworm: Exploring Texts through Metadata

    (http://bookworm.culturomics.org)

    c. 1 million books, 80 billion words

    Library metadata via Open Library

    Digital Public Library of America funding
    Team: Harvard Cultural Observatory, Rice Cultural Observatory, Northeastern University
    Martin Camacho * Neva Cherniavsky * Erez Lieberman-Aiden * JB Michel * Billy Janitsch

    Guiding philosophy

    1. Digital libraries are places to watch the interaction of metadata.
    2. Metadata is about the text (whatever scale).
    3. Words and phrases are (just?) more metadata.

    Textual Metadata: Newspaper Locations

    Bill Lane Center for the American West:
    http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

    Textual Metadata: Correspondence Networks

    Mapping the Republic of Letters

    Textual Metadata: Trial lengths

    Data Mining with Criminal Intent/The Old Bailey Online

    Grounding Words in Texts

    Comparing Custom Corpora

    Focus Attention

    Shifts in language happen across different temporal dimensions at once

    Cohort effects and temporal effects are evenly split