Exploratory Narratives

Ben Schmidt - Northeastern University, History
Core Faculty, NuLab for Texts, Maps and Networks

2016-09-30

Data as a source

What does data journalism offer digital humanities?

  • Digital publication platforms.
  • Multimodal storytelling.
  • The ability to handle technical scaling and other challenges.
  • Coherent, thought-out design.

What are the advantages digital humanists hold?

  • Longer time horizons
  • More ability to put up idiosyncratic material...
  • That doesn't fit into a CMS;
  • That doesn't scale;
  • That's ugly;
  • That's hard to understand.

Digesting Data

DSLs for archival datasets

Alternative ways to represent archival data.

  1. (Linked?) open data

Alternative ways to represent archival data.

  1. Reproducible Research

Alternative ways to represent archival data.

3. Web Visualization applications

Alternative ways to represent archival data.

4. Data narratives

Platform

  1. A large (low GB to low TB) database
  2. A server-side mechanism for sending appropriate of data over the Internet
  3. A DSL to describe data and presentation
  4. Client-side rendering in D3/Javascript to allow access to individual items.

Deck 701: American Shipping, 1800-1850

Deck 892, US shipping 1980-1997

Decks of punchcards

0.0000001% of the ICOADS data set.

1848 6 1     3723 29038 02 4    10ISABE*_N   1   5                                                           
165 20779701 69 5 0 1                  FFFFFF77AAAAAAAAAAAA     99 0 790044118480601
3714N 6937W                                                                           NW     51 NW     57 NW   
51                                          201A.STEWART       NEW BEDFORD             WHALING V
OYAGE           2620 199

Digitized logbooks, c. 1930

Wallbrink and Koek

Logbook Digitization in the 1920s

Wallbrink, H. and F.B. Koek, Data Acquisition And Keypunching Codes For Marine Meteorological Observations At The Royal Netherlands Meteorological Institute, 1854–1968

Deck 720:German weather data

Deck 735:Soviet weather data

Soviet Closeup

The expansion of whaling

An elaborate API call to the Maury browser.

    [
    {
            "DCK":[701],
            "start":365.25*1840,
            "end":365.25*1940,
            "non_standard_fields":["origin","destination"],
            "color":function(d) {
                    if (d.origin.match("BOSTON|NEW BEDFORD|SALEM") !== null) {return true}
                    if (d.destination.match("BOSTON|NEW BEDFORD|SALEM") !== null) { return true}
                    return false;
            },
            "continuation":true,
            horizon:25,
            "projection":"oceanicHomolosine"
    },
    [...]

Minard

Minard reproduction

An elaborate API call to the census browser.

{"formula": "totpop_1870/(settledArea_1870) < 2 ?
    2/(totpop_1870/(settledArea_1870)) : 
    null",
"path_format":"county",
"scale_type":"brewer_quantile",
"show_counties":["uts_sevier","uts_piute","uts_iron","uts_kane"]}

Texts

02138

02138

Google's partner libraries

![

Reddit Ngrams--FiveThirtyEight

bookworm.htrc.illinois.edu

DSL for large text databases

  1. A query language modeled on MongoDB
  2. A visualization grammar modeled on ggplot

Grammar for creating a corpus

"search_limits": {
    "publish_country":["United States"],
   "year": {
      "$lte": 1920,
      "$gte": 1890
   },

And creating a plot

{
"database": "hathipd",
"plotType": "barchart",
"method": "return_json",
"search_limits": {
    "contributing_library__id": {
        "$lte": 15
    },
    "new_date": {
        "$lte": 1922
    },
    "word": ["02138"]
},
"aesthetic": {
    "x": "WordsPerMillion",
    "y": "contributing_library"
},
}

View

{
"database": "hathipd",
"projection": "albers2",
"plotType": "map",
"method": "return_json",
"search_limits": {
    "new_date": {
        "$gte": 1700,
        "$lte": 2010
    },
    "word": ["test"]
},
"aesthetic": {
    "point": "publication_place_geo",
    "size": "TextCount",
    "time": "new_date"
},
"counttype": ["TextCount"],
"groups": ["publication_place_geo", "new_date"]
}

View

Methods are very similar. And I really wish we used the same software. Any way we can agree to develop new features on the same platforms? https://t.co/7TsmLw2OZd — jonathanstray (@jonathanstray) September 26, 2016
{
"database": "hathipd",
"plotType": "heatmap",
"method": "return_json",
"search_limits": {
"new_date": {
"$gte": 1800,
"$lte": 2000
},
"contributing_library__id": {
"$lte": 20
}
},
"aesthetic": {
"y": "contributing_library",
"x": "new_date",
"color": "TextCount"
},
"counttype": ["TextCount"],
"groups": ["contributing_library", "new_date"],
"scaleType": "log"

}

View

Vogue Magazine, Yale University

Acknowledgements

Institutions

  • Northeastern University/Rice University Cultural Observatory

People

  • Erez Lieberman Aiden
  • Neva Cherniavsky, Martin Camacho, Matt Nicklay, Billy Janitsch, JB Michel, Piotr Organisciak.

Acknowledgements

Funders

  • Digital Public Library of America
  • Harvard Cultural Observatory
  • National Endowment for the Humanities
  • Mellon Foundation

Partners

  • Hathi Trust Research Center

(Mild) Failure

Chronicle

Presidents in Google Ngrams

Presidents in Chronicling America

{
"database": "ChronAm",
"plotType": "heatmap",
"method": "return_json",
"search_limits": {
   "publish_year": {
      "$gte": 1860
  },
  "word": ["bicycle"]
},
"aesthetic": {
    "x": "publish_year",
    "y": "publish_day_year",
    "color": "WordsPerMillion"
}
}

View

chronicle.nytlabs.com

(Mild) Success

Language of the State of the Union

Mapping the State of the Union

Comparisons between texts

Contextual Reading

Jeb Bush, quitter

Exploring Limits

   {
   "search_limits": {
      "word": ["iPhone"],"date_year":{"$gte":2002,"$lte":2015}
    },
   "database":"RMP",
   "aesthetic":{"x":"date_year","y":"WordsPerMillion"},
   "plotType":"linechart"
   }

View

{
"database": "RMP",
"plotType": "barchart",
"search_limits": {
    "word": ["exams"]},
"aesthetic": {
    "x": "WordsPerMillion",
    "y": "gender"}
}

View

Student assessments and gender prejudices

Top search terms in

movies.benschmidt.org

[1] love              fuck              shit              gay              
[5] internet          god               terrorist         zombie           
[9] alien             murder            war               environment      
[13] kill              hate              happy             human+trafficking
[17] Montreal          yes               hello             Fuck             
[21] jedi              montreal          no                computer         
[25] incest            America           cunt              vampire          
[29] fear              bitch             space             reset            
[33] chaperone         happiness         robot             dog              
[37] science           beer              yoda              drugs            
[41] escort            police            russian           awesome          
[45] batman            companion         meal+ticket       muslim           
[49] nigger            cool              plus+one          pollution        
[53] rape              United States     terrorism         the              
[57] man               chaperon          coffee            joy              
[61] new+york          Climate+Change    coke              crime            
[65] peace             vagina            anger             bitcoin          
[69] woman             nazi              suffering         lesbian          
[73] bipolar           Love              rage              sam              
[77] bunnies           curiosity         krueger           paris            
[81] sweatshop         gun               me                security         
[85] communist         compassion        freedom           girl             
[89] groovy            jew               weed              nsa              
[93] phone             sex+trafficking   Shit              suit             
[97] superman          The               want              whiskey          
66567 Levels:  ` `` ^ = =) >.< | ¯\\_(ツ)_/¯ - , ,  ; ;-) ;) : ... 萌    

Student assessments and gender prejudices

Top search terms on benschmidt.org/profGender

he, handsome, jerk, dick, arrogant, genius, gay, hilarious, entertain- ing, funny, sexy, clever, brilliant, old, cool, ass, intelligent, smart, fat, cute, interesting, weird, engaging, idiot, boring, great, knowledgeable, best, knowledgeable,

inspiring, awesome, good, challenging, pretty, excellent, passionate, fun, lazy, difficult, hot, good teacher, hard, stupid, fair, dumb, bad, competent, easy, approachable, kind, hate, useless, tough, demanding, attractive, creative, bad teacher, enthusiastic, angry, amazing,

love, clear, confusing, ugly, condescending, crazy, happy, understanding, worst, terrible, nice, friendly, helpful, mean, biased, harsh, horrible, unfair, awful, aggressive, organized, rude, incompetent, annoying, caring, strict, disorganized, evil, bossy, beautiful, sweet, she.

Narratives

http://benschmidt.org/word2vec_map/

#   [1] "he->she"                       "hes->shes"                    
#   [3] "himself->herself"              "his->her"                     
#   [5] "man->woman"                    "guy->lady"                    
#   [7] "grandpa->grandma"              "dude->chick"                  
#   [9] "wife->husband"                 "grandfather->grandmother"     
#  [11] "dad->mom"                      "uncle->aunt"                  
#  [13] "fatherly->motherly"            "brother->sister"              
#  [15] "actor->actress"                "grandfatherly->grandmotherly" 
#  [17] "father->mother"                "genius->goddess"              
#  [19] "arrogant->snobby"              "priest->nun"                  
#  [21] "dork->ditz"                    "handsome->gorgeous"           
#  [23] "atheist->feminist"             "himmmm->herrrr"               
#  [25] "kermit->degeneres"             "mans->womans"                 
#  [27] "hez->shez"                     "himmm->herrr"                 
#  [29] "trumpet->flute"                "checkride->clinicals"         
#  [31] "gay->lesbian"                  "surgeon->nurse"               
#  [33] "daddy->mommy"                  "cool->sweet"                  
#  [35] "monsieur->mme"                 "jolly->cheerful"              
#  [37] "jazz->dance"                   "wears->outfits"               
#  [39] "girlfriends->boyfriends"       "drle->gentille"               
#  [41] "gentleman->gem"                "charisma->spunk"              
#  [43] "egotistical->hypocritical"     "cutie->babe"                  
#  [45] "wingers->feminists"            "professore->molto"            
#  [47] "gruff->stern"                  "demonstrations->activities"   
#  [49] "goofy->wacky"                  "coolest->sweetest"            
#  [51] "architect->interior"           "sidetracked->frazzled"        
#  [53] "likeable->pleasant"            "grumpy->crabby"               
#  [55] "charismatic->energetic"        "cisco->cna"                   
#  [57] "masculinity->gender"           "girlfriend->boyfriend"   

http://benschmidt.org/similarities/

Difficult visualizations

http://benschmidt.org/timelines