Great Expectations & Hard Times: Dickens, Twitter, and R

This is a post about the experience of banging one’s head against the wall. Or, no. This is a post about trial and error. About Great Expectations and Hard Times. About learning. Really, really slowly.

If I had written this post two weeks ago, it would have begun differently. I was feeling good about my progress with R back then. We’d gone from writing code to making that code show things in the form of visualizations. To this end, I was able to confirm Moretti’s observation that book titles shrunk in length over the course of the 18th and 19th centuries. That graphs is below:


Then, I became interested in what the Library of Congress classifications could tell us about the books in our data frame. I made a rather colorful graph showing the rate of growth in number of books published per year and separated out by classification. Of particular interest to me is the way that the literature classification, P,  grew at a relatively more dramatic rate than any of the others. Here’s the graph:


After working with the State of the Union Addresses last week, I was eager to do some textual work myself. I looked around for some texts that might be more relevant to my interest in literary journalism, but, eager to get my hands dirty, I just decided to work with the provided Dickens texts. First, I read in the Dickens corpus and managed to tokenize it by word. From there, obviously, I went straight to the random walk generator, since we had so much fun with it in class and I wanted to see what it might look like in Dickens’ “voice.” The result was pretty cool, and really dialogue heavy. I added some punctuation for effect, but I think it sounds like Dickens:

“It was the first clear indication, Sir Leicester. If I had better be a comfortable home.”

“The slight noise they made me wery cold, I tell you,” said Mr Phunky Serjeant.

Snubbin replied, “The little mother had been a post at the door.”

He held his peace.

“Come,” cried the boy’s face with an air of Fleet Street amidst the loud screams of ladies and gentlemen. “Here but humiliation that I suffered it to the past history her present position as will outlive this danger and your manners.”

My friend Dombey with his disengaged arm and yard: “By his name I heard the old gentleman returned, the Captains said.”

But now we come to the frustrating part. Undoubtedly related to my own inflated sense of confidence, I wanted to actually make something. I decided to take up the suggestion in the syllabus to create a Twitterbot, something I’d thought of trying to learn in the past anyway. My idea was to tweet text generated from the Dickens random walk generator.

This presented a few challenges. First, I needed to figure out how to limit the amount of text the generator produces. As we’ve seen, the code most of us were using would run forever. So, I figured out how to set a limit of 20 words (if I was being really precise I might have used character count to limit it). Then, following some directions I found online, I registered as a developer with Twitter and connected R Studio to Twitter using the TwitteR package. I sent my first tweet from R last night:

Now that I knew I could tweet from R, how to make the content of the tweet some text from the random walk generator? This is where the banging my head against the wall came in. I spent hours, a lot of hours trying to figure this out. The problem was that rather than “print” the text from the function, I needed to store it as a value. Finally, around 9pm last night, I tossed out the Hail Mary to Prof. Schmidt, who, even at that late hour, was kind enough to fire back a bit of code. Another couple hours passed trying to adapt what he provided with what I had set up, until finally, at exactly 11:20pm, I published my first auto generated Dickens quote. Note that I appended an ellipses and a hashtag so that the people who follow me on Twitter wouldn’t worry that I had been hacked or lost my mind. Here’s the tweet:

And then, for good measure, and just to ensure myself that I hadn’t dreamt this success, I sent another one this morning:

The next step will be to learn how to automate this, but for now, it’s fun to think that whenever I get stuck working in R Studio, rather than bang my head against the wall, I can launch an original, auto generated Dickens quote out into the world.

Language Contributions to Subjects

For my first data exploration of the booklists, I started by looking at the subject trends of books from 1850–1922.  Unsurprisingly, there is a general increase in publications for all subjects due to advances in printing technologies.  I measured each subject’s presence in print culture based on the sum of the number of words and the number of publications.  Both are nearly identical.  I could’ve made a mistake in the code, but the graphs have different scales for the y-axis.

Because that graph wasn’t interesting, I focused on which languages were producing texts for each subject.  I limited the languages to French, German, Italian, and Spanish.  German overtakes French as the most published language for the Art, Science, Technology, Music, Medicine, and Economics during the span of this graph.  I wouldn’t have guessed, though, that French was a more common language for Military Science publications right before WWI.
Publication_Lang copy
I’ve added a part of my Faulknerian random walk generator because I really liked it.  I added some punctuation and ellipses to make it look modern.  The modernists loved ellipses.
book_lengthbook_lengththe head of one mule appears, its eyes roll with soft, fleet, wild opaline fire; its muscles bunch and run at it, because jewel is quiet now. “up your…” i said, “thought it would take a rawhiding for thinking they meant it.” but the courthouse lifts among the pine clumps blotched up the ford, used to be enclosed in a cage in jackson where,his grimed hands lying light in a greek frieze, isolated out of him, trying to catch her. “darl catch her darl catch her” darl says. pa says, “reckon i better do.” pa says, “cash does not look back when she finds me watching her, her eyes and face kind of… kind of lived.” one part used no more than you can ride down. dewey dell says, “leaning above the edge of the minds of the…” cash says, “kind of pop eyes like she says she…

A Taste of Italy (Part 1 of ???)

My ultimate project in the class is going to be looking into the specific political and literary works of Gabriele D’Annunzio. This is going to serve as a preface/introductory post to get the gist of the state of Italian literature when he began publishing around 1880. I looked at the full bibliographic data we had been using in class, and found the 5 most frequently published Italian authors (in the Italian language) from 1880-1922. The list is as follows:

  1. Gabriele D’Annunzio (80 books published in Italian)
  2. Giosuè Carducci (53 books)
  3. Alessandro Manzoni (38 books)
  4. Edmondo De Amicis (38 books)
  5. Antonio Fogazzaro (37 books)

I created a simple, yet nauseatingly colorful bar chart with this list:

Most Books Published in Italian 1880-1922

Then I looked at these same 5 authors’ published books in other languages (I am hesitant to say books “translated” into other languages, because I did not distinguish between the same book appearing multiple times in multiple languages from a book appearing once in a non-Italian language). As you will see, D’Annunzio is still the most prolific, but it is interesting to note De Amicis, whose books have been published in the greatest variety of languages.

Popular Italian Authors in Other Languages


Finally, breaking away from the author-specific, I looked at the total number of books published per language for English, Italian, French, and German over this time frame. This is mainly to get a sense of the number of Italian books relative to other languages in the set. The one interesting thing I noticed, is in the early part of World War I, around 1914-1915, it seems there is a steep drop in German, French, and Italian, whereas the English language books remain steady.

Books by Language 1880-1922

Wiggly Tales: A Random Walk Generator

We’ve been reading a lot of fairy tales around my house recently, so I wanted to see how well-spun of a tale I could create by walking randomly through a collection of fairy tales. I selected four fairy-tale collections from Project Gutenberg to test this idea on. Code is on GitHub.

I selected these four collections:

The addition of the Arabian Nights stories to Western European fairy tales makes the random generator more interesting, sometimes throwing the geographical sense of the tale into a different place and a different world.

This version generated my favorite beginning: “once upon a time a man by the river yes he was looking straight into the deep waters skeletons of walruses.”

But other versions of the generator took an even darker turn. Here’s the raw text:

“once upon a great procession which was conscious of pain And sore regret of which she said nothing but torment and affliction that He sniffed about to give the ants were always running to and when he approached her they did not really birds but she bore thee Thou hast nothing to me Only tell me something Why this is what you say What is the news O my sister relate to me Art thou she whom he found it impossible to think of The old rough doll You are learned and wise men assembled together in his age and to nail up my mind every earthly care and sorrow with soft turf From the narrow walks and the Wezeer the father of Is both of you should care so much that renders men sinful and impure He fully realized the true the speaker s hand saying to each other till the morning following I have with me from first to last and then burst and fell fast asleep”


And here’s the story, with some punctuation that I added for “clarity”:

Once upon a great procession–which was conscious of pain and sore regret, of which she said nothing but torment and affliction that He sniffed about to give. The ants were always running to, and when he approached her, they did not really birds but she bore thee: “Thou hast nothing to me. Only tell me something: Why this is what you say? What is the news? O my sister relate to me! Art thou she whom he found it impossible to think of? The old rough doll? You are learned, and wise men assembled together in his age and to nail up my mind every earthly care and sorrow with.” Soft turf from the narrow walks and the Wezeer the father of Is, both of you should care so much! That renders men sinful and impure. He fully realized the true the speaker’s hand, saying to each other till the morning following, “I have with me from first to last,” and then burst and fell fast asleep.

And sometimes it’s important to be reminded of where your texts come from. I didn’t remove any text at all from the Project Gutenberg texts, which means that the copyright and distribution information could appear in our stories too. For example:

“The two grand annual festivals are observed with public domain eBooks Redistribution is subject to particular laws or rules with respect to our beetle to himself but the observance of this Wezeer So the porter approached the Distracted Slave of Love when his boat or playing in the lap of prosperity and the fear of him said the Fire drum Peter has gone away I ll do something in me.”

All this generator proves is that tales can be wiggly indeed.


Installing Git

Using git will make it easier to access the course files in RStudio.

Here’s how to get it.

On Linux, it’s probably already installed. Any package manager will include a git install.

On Windows, just follow the official instructions.

On OS X, you’ll need to install the “XCode command line tools.”

There are instructions online for doing this: the precise mechanism varies by operating system. You can always upgrade to Yosemite, the latest version, and follow these instructions. But don’t feel the need to upgrade if you’re on an old machine: it may slow you down.

On some versions, that will install git directly. But if you want to install more command-line tools, it may be worthwhile (on a Mac) also installing a program called homebrew. 

To install it, open the application “terminal,” and paste the following:

ruby -e "$(curl -fsSL"

Follow the prompts. It will require your password.

Afterwards, you can install the latest version of git. The way to do this is to type at the command line:

brew install git

This works for all sorts of programs: you can also, for example, upgrade to the latest version of R by typing

brew install R

Some R packages you may encounter on your own have complicated “dependencies:” that is, they may need some other set of programs installed. (For example, to do advanced mapping in R, you may need the gdal toolset). `brew install XXX` will frequently let you install a program without even having to find its website.

First post

Hello, world. The course is slowing coming online. See the syllabus outline for a description of the aims of the class: the first paragraphs are here.

Our course time is currently looking to be Thursdays from 4:30 to 7:00. Tuesday afternoons might also be a possibility. Anyone who has any predispositions on behalf of particular times should contact me at {}, at {}.

Data analysis in the humanities presents challenges of scale, interpretation, and communication distinct from the social sciences or sciences. It also, some argue, opens up new opportunities for creative storytelling and narrativity. This seminar will explore the emerging pratices of data analysis in the digital humanities from both a critical and a practical perspective.

What light can algorithmic approaches shed on live questions in humanistic scholarship? What new forms of research are enabled by the use of data? What sort of data do practicing humanists want museums and libraries to make available?

Our goal in this class will be to explore the new emerging forms of data analysis taking place in humanities scholarship, both in terms of applying algorithms and in terms of better investigating the presuppositions and biases of the digital object. We’ll aim to come out much more sophisticated in the use of computational techniques and much more informed about how others might use them.