Plot Arc-eology

Benjamin MacDonald Schmidt Assistant Professor of History, Northeastern University Core Faculty, NuLab for Texts, Maps, and Networks

May 2015

Bookworm

Acknowledgements

Institutions

  • Northeastern University/Rice University Cultural Observatory

People

  • Erez Lieberman Aiden
  • Neva Cherniavsky, Martin Camacho, Matt Nicklay, Billy Janitsch, JB Michel.

Acknowledgements

Funders

  • Digital Public Library of America
  • Harvard Cultural Observatory
  • National Endowment for the Humanities

Partners

  • HathiTrust Research Center
  • github.com/Bookworm-project
  • bookworm.culturomics.org
  • benschmidt.org/moosehead

Hathi Trust

A complicated example

{
"database": "catalogworm",
"plotType": "map",
"projection": "mercator",
"method": "return_json",
"search_limits": {
    "*htsource":[
        "University of Michigan",
        "University of California"
    ],
    "date_year": {
        "$gte": 1800,
        "$lte": 1922
    }
},
"aesthetic": {
    "color": "TextPercent",
    "size": "TotalTexts",
    "point": "publication_place_geo",
    "time": "date_year",
    "label": "publication_place_toponymName"
},
"scaletype":"linear"
}

View

Open Subtitles Data

Open Subtitles.org

Open Subtitles Bookworm

Barchart of overall counts

{
"database": "movies",
"plotType": "barchart",
"method": "return_json",
"search_limits": {},
"aesthetic": {
    "x": "TotalTexts",
    "y": "medium"
}
}

View

Time chunking

Lazarsfeld-Stanton Program Analyzer

Lazarsfeld-Stanton Program Analyzer

CBS internal program analysis

Ernst Dichter papers, Hagley Museum & Archive

A subtitle chunking algorithm:

  1. Split into thirds;
  2. While the chunk length is less than
> a. 6ths
> b. 12ths
> c. 24ths
> d. etc...

Number of chunks, by minute.

{
"database": "screenworm",
"plotType": "linechart",
"method": "return_json",
"search_limits": {
    "minute": {
        "$lte": 240
    }
},
"aesthetic": {
    "y": "TotalTexts",
    "x": "minute",
    "color": "medium"
}
}

View

Searching individual words

Words and phrases show very strong trends

{   "database": "screenworm",
"plotType": "linechart",
"search_limits": {
    "movieYear":{"$gte":1830,"$lte":2022},
    "6th":[1,2,3,4,5,6],
    "*TV_show":["Seinfeld"],
    "*topic_label":["call phone Hello number called message calling"]
},"words_collation":"Case_Sensitive",
"aesthetic": {
    "x": "6th",
    "y" : "WordsPerMillion"
}}

View

The 9 most linear-trending topics, by fittedness of linear model

The 9 most center-oriented topics, by fittedness of linear model

Plots are tracing paths through a multidimensional space

Plot arcs in 128-dimensional topic space

Plot arcs in 128-dimensional topic space