Reading texts with Big Metadata: the Bookworm Platform
Benjamin MacDonald Schmidt
Fellow, Cultural Observatory @ Harvard
Ph.D. Candidate in History, Princeton University
Bookworm: Exploring Texts through Metadata
(http://bookworm.culturomics.org)
c. 1 million books
80 billion words
Library metadata via Open Library
Digital Public Library of America funding
Other versions for historical newspapers, journal articles, and more
Textual Metadata: Newspaper Locations
Bill Lane Center for the American West:
http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers
Textual Metadata: Correspondence Networks
Mapping the Republic of Letters
Textual Metadata: Trial lengths
Data Mining with Criminal Intent/The Old Bailey Online
Google Ngrams
http://books.google.com/ngrams
Comparing Custom Corpora
Bookworm Arxiv
600,000 math and physics articles from the last 20 years
arxiv.culturomics.org
Bookworm ChronAm
4 million newspaper pages, 1840-1922
From chroniclingamerica.loc.gov
Bookworm: Exploring Texts with Metadata
Bookworm: Exploring Texts with Metadata
The Bookworm API
Request data using JSON queries
Post using http GET requests
Return data in JSON and TSV
An example query
{
"method": "return_tsv",
"counttype":["WordsPerMillion"],
"search_limits": {
"country": ["USA","UK"],
"word": ["natural selection"]
},
"groups": [
"year"
],
"database": "OL"
}
The Response
year WordsPerMillion
[...]
1907 340.20526777
1908 341.83114533
1909 295.24911692
1910 282.24802327
1911 284.92406591
1912 283.89805752
1913 296.87614627
1914 332.76147647
1915 446.39889626
1916 428.87396542
1917 527.51044740
1918 647.48528263
1919 653.05159042
1920 507.23177682
1921 501.77615474
[...]
Using Cities as Words and Places
Using Individual Newspaper Locations
Rate of mentions of Topeka (each dot is one newspaper)
States and Regions are both as important as distance
Does race change imagined geographies?
Does race change imagined geographies?
Bookworm ChronAm
4 million newspaper pages, 1840-1922
From chroniclingamerica.loc.gov
←
→
#