Bookworm

Bookworm is an ongoing, open-source collaborative project I co-direct with Erez Aiden at the Rice Cultural Observatory.

Time browsers.

The stable version of Bookworm is typically deployed against the line chart browser initially developed by Martin Camacho and contributed to by many people, including Neva Cherniavsky, Billy Janitsch, and Matt Nicklay. That has its deep roots in the Google Ngram browser Erez Aiden made with JB Michel in 2010. To see some of the ones we’ve built at the Cultural Observatory so far, see the main Bookworm site. I also recommend my browser for 80,000  movie and TV scripts. A temporary version of the Open Library Browser is available here; that shows all the different sorts of work that are possible.

Others have built their own browsers; some include the Yale Vogue browser and the Hathi Trust alpha Bookworm browser (with very detailed metadata). We’re building that out through an NEH grant.

Other interfaces.

The Bookworm API allows a variety of different interfaces: I’ve built several on a dynamic D3 library that allow direct reference to original texts just like the line chart browser. For the 2015 State of the Union addresses, Atlantic staffers built off a Bookworm repository for a temporal map interface of every place mentioned in the history of the speech. We also deployed the dynamic bar chart interface to show how often different presidents use words.

I also released two more experimental browsers around the SOTU. One illustrates the use of Dunning Log Likelihood for corpus comparison with the tools: you can see what words are used at significantly different rates by any two different presidents. The other embeds the Bookworm charts more deeply in an interactive site: it lets you read any given address in the context of all others, by clicking on the words to see how often other presidents used them.

I’ve also used a dotchart interface to illustrate the differences in language on RateMyProfessors.com to describe male and female professors differently. 

The heatmap visualization hasn’t found a fully mature form yet, but you can see a demo on the Social Security statename dataset here.

API and Documentation.

All code is up on GitHub under the MIT license. Documentation, focused particularly on how to deploy your own server, is here. An interactive sandbox demonstrating the Bookworm API is available here. And instructions for using the API are in the documentation here. If you plan to use it any significant rate, please contact me or the host of the page.

Further development is ongoing from myself at Northeastern and many at the Rice Cultural Observatory, directed by Erez Lieberman Aiden. Hosting has been generously provided by the University of Chicago’s Open Science Data Cloud, and funding has come from the Digital Public Library of America and the National Endowment for the Humanities.