Plot Arceology

Benjamin M Schmidt Assistant Professor of History, Northeastern University Core Faculty, NuLab for Texts, Maps, and Networks

October 2015

Modeling Plots

Approaches to modeling plot

  • Jockers (2015)
  • Sentiment analysis to trace fortune/misfortune
  • Pre-chosen feature set from standard dictionaries
  • Emergent patterns of plots
  • Piper (2015)
  • Conversion narratives modeled
  • Feature set emergent from works
  • Pre-chosen plot structures
  • Other types of sources
  • Beauchamp - Political Speeches
  • Reiter - Fairy tales

Plot arcs

  1. Corpus: Television shows and scripts
  2. Method: Characterization of aggregate patterns through dimensionality reduction (topic modeling).
  • Applicable to other corpora

The corpus

Lazarsfeld-Stanton Program Analyzer

Lazarsfeld-Stanton Program Analyzer

CBS internal program analysis

Ernst Dichter papers, Hagley Museum & Archive

Open Subtitles.org

Open Subtitles Bookworm

Linechart of different media over time

{
"database": "movies",
"plotType": "linechart",
"method": "return_json",
"search_limits": {"MovieYear":{"$lte":2020}},
"aesthetic": {
    "x": "MovieYear",
    "y": "TotalTexts",
    "color": "medium"
}   
}

View

A subtitle chunking algorithm:

  1. Split into thirds;
  2. While the chunk length is greater than two minutes, split each chunk in half yielding successively
  3. 6ths
  4. 12ths
  5. 24ths
  6. etc...
{
"database": "screenworm",
"plotType": "linechart",
"method": "return_json",
"search_limits": {"6th":{"$gte":1},
"word":["love you"]},
"aesthetic": {
    "x": "6th",
    "y": "WordsPerMillion",
    "color": "medium"
}
}

View

Interactives

benschmidt.org/arceology

Plot arcs

Example topics

topic label
2 Wait wait minute Let’s Look Hurry let’s
10 clean smell water wash use bath bathroom
11 film movie show TV movies scene play
31 hair funny look joke laugh big teeth
34 sir course Thank dear London quite Ah
60 talk talking Look crazy understand Listen problem
61 animals animal bear food wild hunting lion
74 hear voice heard sound radio noise listen
78 game play ball team playing win football
80 girl girls boy look name beautiful pretty
85 drink wine beer drunk bottle drinking glass
90 years world land water sea ago life
110 Madame de Monsieur French dear course evening
113 God cool Whoa Look look dude Wow
119 married wife wedding husband love marriage woman
121 Agent agent security FBI team CIA agents

Top topics in Law and Order, by number of total words in topic per sixth.

Individual topic trends interactive

The 9 most linear-trending topics, by fittedness of linear model

The 9 most center-oriented topics, by fittedness of linear model

Aggregate chunks

Six plots in a two dimensional vector space

Overall characteristic plot movements in topic space, reduced down to 2 principal components

Overall characteristic plot movements in topic space, reduced down to 2 principal components

Percentage of remaining variance explained by each Principal Component compared to random walk data

Conclusions