Humanities Visualization in Cartesian space

Benjamin Schmidt
Assistant Professor of History, Northeastern University
Core Faculty, Nulab for Texts, Maps, and Networks

www.benschmidt.org

Humanistic Data Visualization in Practice

The T

Source: Massachusetts Bay Transportation Authority

benschmidt.org/mbta

benschmidt.org/mta

Visualizing color over words.

Source: Lisa M Rhody, Topic Modeling and Figurative Language, Journal of Digital Humanities 2.1

Word Clouds

Source: wordle.net; this talk

Critiques of Word Clouds

Source: Drew Conway, Building a Better Word Cloud (January 2011)

Why don't humanists like cartesian representations?

Source: "Visualizing Emancipation," http://dsl.richmond.edu/emancipation/
Source: Johanna Drucker, Humanities Approaches to Graphical Display, DHQ Volume 5 Number 1.

Returning to cartesian space

Bookworm: Exploring Texts through Metadata

(http://bookworm.culturomics.org)

c. 1 million books, 80 billion words

Library metadata via Open Library

Digital Public Library of America funding
Team: Harvard Cultural Observatory, Rice Cultural Observatory, Northeastern University
Martin Camacho * Neva Cherniavsky * Erez Lieberman-Aiden * JB Michel * Billy Janitsch

Guiding philosophy

  1. Digital libraries are places to watch the interaction of metadata.
  2. Metadata is about the text (whatever scale).
  3. Words and phrases are (just?) more metadata.

  • Metadata describes the world we care about
  • Huge metadata collections are worth studying on their own
  • Climatogical Metadata is Historical Data

    Textual Metadata: Newspaper Locations

    Bill Lane Center for the American West:
    http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers

    Textual Metadata: Correspondence Networks

    Mapping the Republic of Letters

    Textual Metadata: Trial lengths

    Data Mining with Criminal Intent/The Old Bailey Online

    Grounding Words in Texts

    Google Ngrams

    http://books.google.com/ngrams

    Google Ngrams

    http://books.google.com/ngrams

    Comparing Custom Corpora

    Bookworm Arxiv

    600,000 math and physics articles from the last 20 years

    arxiv.culturomics.org

    Chronicling America.

    Bookworm: Exploring Texts with Metadata

    The Bookworm API

  • Request data using JSN queries

  • Post using http GET requests

  • Return data in JSON and TSV

  • An example query

    
                {
                "method": "return_tsv",
                "counttype":["WordsPerMillion"],
                "search_limits": {
    	    "country": ["USA","UK"],
                "word": ["natural selection"]
                },
                "groups": [
                "year"
                ],
                "database": "OL"
                }
          

    The Response

    
                year    WordsPerMillion
                [...]
                1907    340.20526777
                1908    341.83114533
                1909    295.24911692
                1910    282.24802327
                1911    284.92406591
                1912    283.89805752
                1913    296.87614627
                1914    332.76147647
                1915    446.39889626
                1916    428.87396542
                1917    527.51044740
                1918    647.48528263
                1919    653.05159042
                1920    507.23177682
                1921    501.77615474
                [...]
          

    Interactions among metadata

    
          {
          "method": "return_tsv",
          "counttype":["WordsPerMillion"],
          "search_limits": {"word": [ "natural selection" ]},
          "groups": ["state","year"],
          "database": "OL"
          }
      
    
          state	year	WordsPerMillion
          [...]
          NJ	1901	0E-8
          NJ	1902	0E-8
          NJ	1903	0.52162392
          NJ	1904	0E-8
          NJ	1905	0E-8
          NJ	1906	0E-8
          NJ	1907	0.52719259
          NJ	1908	0.59582825
          NJ	1909	0.23120944
          NJ	1910	1.08461634
          [...]
    

    Returning Words As Metadata

    
          {
          "method": "return_tsv",
          "counttype":["WordCount"],
          "search_limits": {
             "year":[1877],
             "state":["RI"]
          },
          "groups": ["unigram","year"],
          "database": "presidio"}  
      
    
          unigram	year	WordCount
          [...]
          resolve	1877	8
          resolved	1877	272
          resolves	1877	10
          resolving	1877	2
          resort	1877	10
          resorted	1877	2
          resorts	1877	4
          resound	1877	1
          [...c. 23,000 total rows...]
      

    Bookworm ChronAm

    4 million newspaper pages, 1840-1922

    From chroniclingamerica.loc.gov

    Location, Subject, and Ethnic metadata

    Anachronisms Reflect What Change is Considered "Historical"

    Mad Men with Computers.

    Moby Dick between geography and fiction

    Moby Dick between geography and fiction

    #