Monthly Archives: October 2013

Data Visualization

While exploring the world of “humanities computing,” that is, using digital methods in research, some common themes I’ve noticed are the visualization of data and text mining. By “visualization” I refer to the use of computers to generate charts and models to show associations between, or elements of, data (usually quantitative in nature). Text mining, on the other hand, is the scanning of digitalized texts for word occurrences and word patterns, and can be open-ended or delimited by the researcher according to his or her interest.

Text mining was covered heavily by Matthew Jockers in Macroanalysis. His main concern is to demonstrate the viability of applying text mining to large sets of data; in his case, a corpus of principally Irish and British literature. By using computer software Jockers is able to hastily complete an analytical process that would take weeks if left to human mental faculties. He compiles data about the writing style, themes, and subjects in the corpus of novels to show their relationships with, for example, nationality and gender. He also represents this data visually by using charts and word clusters (a model where the size of the word or phrase and its position relative to the center of the cluster symbolize significance) to show trends or hidden aspects in the data, whether by year, region, nationality, etc.

This week I’ve encountered some essays which use the visualization technique in a slightly different way. In The Other Ride of Paul Revere Shin-Kap Han I was introduced to the use of network modeling. Han uses models to provide a visualization of membership data which illustrates the connections between Paul Revere, Joseph Warren, and political groups that were otherwise disjointed at best. Caroline Winterer, for her part, uses network modeling to map letter correspondences for Benjamin Franklin and Voltaire. A significant difference between these scholars and Jockers is that they use modeling not to represent aspects of text but more concrete historical social realities.

Whatever the end result of modeling and/or text mining, there’s a notable degree of cautious skepticism present among the writers. Jockers suggests the methods of the digital humanities can only accompany, or are even secondary to, the traditional method in humanities scholarship of close reading and interpretation. Winterer also says distinctly that her study attempts to bump against the boundaries of digital mapping and that we have “good reasons to be wary of what digitization and visualization can offer us” (Winterer, 598).

While I think it’s too early for me to put in a good word for text mining, especially after an in-class experiment last week returned some peculiar results, the visualization aspect of the digital humanities seems, based on my inchoate impressions, at least, to be a useful tool for interpreting data. Certainly there’s validity to the issues of how to properly interpret the meaning of the data that’s being visualized in the wider historical context–an end to which, it could be argued, is dependent on the traditional “close reading” approach. But it seems more striking, more eye-catching, I think, to visualize the transfer of letters instead of weeding out one by one their individual destinations, for example. We should be cautious on how to interpret models but certainly we can accept them as an aid to our interpretation.

Identifying Themes

In Macroanalysis, Jockers makes a concerted effort to delineate the reach and the limitations of “big data” with regards to interpretation and meaning. In the foundational opening chapters, he is clear to qualify the products of computer-based text analysis when he notes that “an early mistake …was that computers would somehow provide irrefutable conclusions about what a text might mean” (40). He explicitly denies that this macro scale is supposed to supplant analysis in that regard. Regarding his careful maintenance of this limitation through all the literary and historical conclusions he comes to, I thought the way he toed this line during his section on topic modeling particularly interesting.

I am accustomed to thematic material, derived between the lines of a book’s contents, that is usually subjectively interpreted, but I think now that is because I conflate the discovery of a theme with its interpretation. However, this relates back to Jockers’ description of research bias in terms of Irish Americans, and the focus on urban environments in Eastern states which clouded the understanding of important activities in the Western frontier. The raw material, consisting of the author’s actual words, in a piece of literature is the same no matter who reads it and specific topis require the use of specific words. Through topic modeling, these words are drawn out to find themes that may be obvious in one text (whaling in Moby Dick) but not so much other texts. Also, done on a macro scale, learning of the existence of these topics in a corpus of works too large for a single person to synthesize clearly reveals trends that open up other avenues for research, probably done on a micro scale.

This is where interpretation and meaning become foregrounded considerations, as subjectivity is derived from the commentary the author makes with the theme and the emphasis or connection the readers has on certain attributes of the theme due their own biases, and an author’s “artistry…comes in the assembly of these resources” (174). The trends that are highlighted from the macro reading serve to contextualize findings from a micro reading (even as often macroanalysis literally removes words from their surrounding text) because works of literature do not exist in isolation, and neither do historical events and activities. In Jockers’ own efforts mapping influence and discussing intertextuality in literature, he touches upon what seems like a valuable application for historical research. As literature synthesizes the (usually) unique voice and insight of an individual author the with common qualities of language, culture, and generation, and requires different methods to access specific data, addressing history engages similar parameters of time and distance, and thus similarly benefits from the interplay of close and distant readings.

Zooming In and Out: Close Reading and Distant Reading

After reading through Jocker’s (Macroanalysis 2013) introductory chapters, one concept really jumped out to me that I found very valuable to understanding the role of Macroanalysis and “distant reading” in the humanities. One usually hears about the close reading/distant reading debate as something that is mutually exclusive—where a humanist either employs close reading or distant reading when analyzing sources. Close readings suffer from what Jocker’s refers to as “anecdotal evidence” (8), where one hypothesizes overarching theories from a very limited sample. Distant readings, on the other hand, may be able to analyze more texts in different ways, but often result in a loss of the contextual information that a close reading can reveal.

Jockers, however, in his third chapter, “Tradition,” uses a word that I think is a very valuable way of thinking about the close/distant reading debate. He uses the word “zooming” (in and out), to describe how close and distant readings can be complementary. To “zoom” is a useful and interesting way to describe this phenomenon. It infers a spectrum of scale in text analysis, and in digital work in general. Instead of choosing to do a close reading OR a distant reading on a given corpus, one can zoom in and out along this spectrum of textual analysis. Zooming establishes a complementary rather than a combative interplay between close and distant reading.

And this zooming can be employed across different projects or within the same one. One can “zoom in” on a single work or small corpus of sources, employing a traditional close reading, or one can “zoom out” and perform a distant reading of a million books. But this does not mean that once one zooms in they cannot zoom out in the same project. The scholar analyzing a single work or small corpus of sources can still benefit from a distant reading of both those sources and the larger corpus of digitized works. They can perform a basic text analysis of word usage, word pairings, and structural components to inform their close reading. Moreover, one can zoom out even more and use a broader text analysis of related works during the period to confirm or support their broader claims based on close reading. For example, one might postulate broader societal, political, and religious trends of a certain place during a specific time period based on a close reading. A distant reading of the larger corpus of works from the same region around the same time can support or disprove these speculations. A distant reading, therefore, complements a close reading by acting as means to sidestep only using “anecdotal evidence.” A distant reading can also be supplemented by a close reading of text. A pure distant reading runs the risk of becoming too abstract or removed from the texts. A close reading of a sample of works from a larger text analysis can support the broader phenomenological and discursive trends that distant readings attempt to reveal. Consequently, zooming in and out along this spectrum of scale allows close readings to compliment distant readings and large-scale text analysis to support the claims of in depth study into a limited number of sources. Zooming in and out allows those working in the humanities—digital or traditional (for lack of a better word)—to make their arguments with a greater level of precision and efficacy.

 

Text Analysis.

In addition to Jockers’ Macroanalysis, which is a formidable work with a huge number of texts, we’re reading a short blog post by Cameron Blevins about topic modeling Martha Ballard’s diary.

This is an example of the sort of small scale, but potentially helpful interventions that humanists can do quite easily with some existing packages.

We’re going to do some topic modeling in class using R and Mallet; I’ll have some sets of text build up that you can work with, but it will be extra useful if–like Blevins–you’re able to find some text collection of your own that you can work with. If you have one or an idea for one, let me know and we can figure out how to get it ready for class.

Applying Macroanalysis to Individual Research Projects

I identified with this week’s readings as being something that I would be able to apply directly to my own interests and research topics. While reading Macroanalysis it became clear that I could use these methods for myself. Just thinking of the possibilities of thousands of transcribed cookbooks (much like last week’s assignment) and applying methods of text analysis and data mining to them, think of what we could do with the information. One of the major issues I have grappled with in my study of cookbooks and trying to find popular recipes and standard measurements and ingredients is deciding which cookbooks to look at. In my last paper on the history of baking apple pie I focused a lot on the Joy of Cooking as the be all, end all general cookbook for people interested in making pastry in the United States. By being able to compare hundreds and thousands of cookbooks side by side, looking for keywords such as “granny smith,” “butter,” “lard,” I would have had a much easier time delineating what the most popular recipes and techniques have been throughout history. Instead, I had to assume that in 20th century America, all women followed the standard piecrust presented in one cookbook, albeit a popular one. It also would be helpful in finding what recipes were most popular during specific time periods, information that could lead to broader analysis of food availability by region or cultural preference, in the same way that Jockers was first able to examine Irish-American authors, and later a broader range of texts.

I was further encouraged in reading about Martha Ballard’s diary and how that was digitized and analyzed. It seems that these methods would give the benefits of close reading, but also allow for a broad study of several hundred primary sources. In Macroanalysis I was mostly impressed with the layout of the book, the charts, his descriptions, it appeared as though he was doing what we read about last week by revealing his methods and the thought process behind his analysis. In doing so it allowed the work to feel more transparent, which I believe allowed me to connect it more to my own interests and resources I am working with.

I should also note that while there are several advantages (clearly) to these methods and presentation of them by the two digital humanists we have looked at, I could see where those involved in the postcolonial digital humanities discussion would have a hard time jumping on this macroanalysis bandwagon right away. Jockers was able to analyze works by authors from England, Ireland, Scotland, and the U.S. This lends the question of how long would it take and if anyone is working to digitize the writings and primary sources from other nations and outside of this white, mostly well-educated frame. What interests me as well is if these programs can be developed to look at different languages, even ones that don’t use typical Latin characters, and if they would be as effective as the one that was able to read Martha Ballard’s scrawl. I tried to do some searching on my own, but do we know if scholars are currently developing programs to counteract this apparent English-language dominance?

Digitizing Family Heirlooms & Records

While scrolling through my twitter feed, I found a link to an article from the Daily Mail in UK about new Civil War artifacts that are being “discovered” and digitized as part of the sesquicentennial celebrations. Historians and archivists are working to expand the types of primary sources they include in their online databases by reaching out to families and encouraging them to search their attics for relics.

One statement that I found particularly interesting is: “In Virginia, archivists have borrowed from the popular PBS series ‘Antiques Roadshow,’ travelling weekends throughout the state and asking residents to share family collections, which are scanned and added to the already vast collection at the Library of Virginia.” This is a  different sort of crowdsourcing from the transcriptions and rectifying maps that we discussed in class. These photos and documents are being made public by state libraries and archives, in fact you can see some of them within the article as well.

From these initiatives we can see the value in opening up the digital humanities to public use. By making these artifact accessible to the public and by expanding public interest we can ensure the preservation of our nation’s heritage as well as expand  our knowledge of the American Civil War.