Bookworm



Open Library Bookworm

1 million books

80 billion words

Library metadata via Open Library

Arxiv Bookworm

600,000 Math and Physics Articles

Jstor Bookworm

600,000 articles from before 1922

Great data on disciplines, excellent metadata quality

Historical newspapers

4m newspaper pages

From chroniclingamerica.loc.gov

Query Structure

To generate a time chart for a new phrase:
"natural selection"


{
 "method": "return_tsv",
 "collation": "Case_Sensitive",
 "counttype":["WordsPerMillion"],
 "search_limits": {
  "word": [
   "natural selection" 
  ] 
 },
 "groups": [
 "year"
 ],
  "database": "OL" 
}
  

Query Structure

Groupings


{
 "method":"return_json",
 "words_collation":"Case_Sensitive",
 "groups":["year_year","school"],
 "database":"HistoryDissTest",
 "counttype":["WordCount","TotalWords"],
 "search_limits":{"word":["Health"]},
 "plotType":"heatMap"}

Query Structure

Grouping by words


{"method":"return_json",
"words_collation":"Case_Sensitive",
"groups":["year_year","school"],
"database":"HistoryDissTest",
"counttype":["WordCount","TotalWords"],
"search_limits":{"word":["1234"]},
"plotType":"heatMap"}

Query Structure

Grouping is possible by any variables:


{
 "method": "ratio_query",
 "collation": "Case_Sensitive",
 "search_limits": {
  "word": [
   "concentrate attention" 
  ]
  "alanguage" : [
    "eng"
  ]
 },
 "groups": [
  "year",
  "country"
 ],
 "database": "presidio" 
}
  

Query Structure

Grouping by word allows mining comparisons across groups:

(This pulls the counts of every word following the word 'This' in J-Stor from 1880 to 1922, grouping by discipline and decade; a good mix of nouns, but only uses 0.1 % of the data)

{
 "database":"jstor",
 "groups": [
  "discipline",
  "words2.casesens as w2",
  "ROUND(year/10)*10 as year" 
 ],
 "method": "counts_query",
 "collation": "Case_Sensitive",
 "search_limits": {
  "year": {"$gt":1880,"$lt":1922},
  "word1": [
   "This" 
  ] 
 } 
}

/

#