DH Project, Conclusions and Future Research

(Note: This post is the concluding post in my series of posts exploring text analysis, visualizations, and stories about the Boston Marathon from the Our Marathon archive. Each post can be found on my personal blog or by navigating through the table of contents that I have included at the conclusion of this post.)

From these previous posts, I hoped to show a few things. First, I wanted to showcase how using various text analysis platforms, paired with some closer reading of these texts and some manual searches, allows for much richer investigative experience. Word Clouds and Phrase Nets, despite their drawbacks, when used effectively across various text analysis tools allow us to quickly visualize and formulate research questions. Moreover, they can even help us devise some preliminary findings.

Second, I aimed to try to showcase some of the ways in which the Globe Stories were structurally different from the Public Submissions. Sociologist Arthur W. Frank distinguishes between three different types of narratives people formulate when dealing with a trauma in The Wounded Storyteller (1995). Among them are restitution narratives. Restitution narratives consist of a three part structure from healthy to sick, culminating in a hopeful and happy return to health or “normal” (in his case he was looking at those struggling with severe illnesses). From some of the conclusions mentioned in the “Where are the Bombers?: What Can Word Clouds Tell Us?” and “#BostonStrong”, I think we can classify a significant number of stories as following this general structure. Especially in the “Boston Strong” stories, one can see this narrative pattern, particularly the final section stressing a hopeful return to health–but here, health of not only the individuals but also of the city as a whole. The Globe Stories are a bit harder to classify, and perhaps this is due to their brevity. These stories, on the whole, are shorter than the their Public Submission counterparts. Many of them appear to more like abstracts than full stories, which could account for some of the differences between the two sub-corpora.

These posts have shown me a lot about what kinds of questions and conclusions you can derive from analyzing text. However, I do have some criticisms of my own work. First, which I might have discovered a little too late, I do not think my corpus was substantial enough to draw definitive conclusions about these stories. The Our Marathon archive has been collecting stories for a little less than eight months at this point, so I decided to take a look at everything we had. I had spent a lot of time learning how to try some topic modeling with MALLET, thinking that it would be a substantial addition to my posts. Instead, I realized that no matter how I tweaked the model, the topics that I generated did not reveal any really great insights. At that point, I decided to focus on my other sections.

That being said, I still think this series of posts has its value, not only for the preliminary findings I was able to extract, but also as a building block for future analysis with the Our Marathon material. Our Marathon is continuing to grow and we are planning quite a few outreach days to gather more submissions to the archive. As we gather more material, I can re-apply these methods to check and see if my hypotheses remained. Moreover, as we incorporate more stories, I hope to re-visit some of the questions that my preliminary investigation revealed.

Also, the Our Marathon team is currently involved in a very substantial oral history project with people directly involved in and influenced by the marathon bombings (i.e. survivors, first responders, and local business-owners). And as these audio files are uploaded into the archive, we are making sure they are accompanied by full-text transcriptions. Once these transcriptions are in the archive, I could incorporate that text into my analysis and compare oral versus written narratives of experiences during the marathon bombings. Who knows? I might finally be able generate a meaningful and insightful topic model about the Our Marathon stories.


  1. Share Your Story: Storytelling and the Boston Marathon Bombings
  2. Where are the Bombers?: What Can Word Clouds Tell Us?
  3. #BostonStrong
  4. “Fireworks or Cannons”: Phrase Nets of the Marathon Stories
  5. Conclusions and Future Research (this post)

My DH Project

In a previous blog post, I mentioned that I thought it would be useful to consider the value of the Digital Humanities outside of the framework of strictly research and interpretation.  For my project, however, I decided to work within this framework, but from a different standpoint. Referring to readings over my introduction to the Digital Humanities, I inferred that its use-value, so to speak, is largely viewed in terms of digital publications, digital archives, and large-scale macro analysis, at least on the textual level of things. My thought was, what if digital methodology could be applied to textual analysis in a more traditional, “close reading” sense, as Matthew Jockers would say.

My plan was to use OCR software to transcribe PDFs of collections of writings by Ralph Waldo Emerson, an important figure in American intellectual history. I figured that with some background knowledge on Emerson’s life, ideas, and influence, I could use topic modeling on the transcribed texts to identify keywords to search the PDFs, thus making the research process more efficient, and, foreseeably, allowing the historian using digital tools to consult more sources than previously possible (as by using only ones eyes, concentration, and caffeine, for example).

Halfway through my project I realized I had naively assumed too much about the ease with which digital sources could be transcribed, modeled, and searched. For one thing, I did not take into account how the text would be transcribed:

Screen Shot 2013-12-08 at 6.07.30 PM


I realized that I had assumed that the transcription would go smoothly, not that the column breaks would be interpreted literally! This was a particularly egregious error on my part as I had had prior experience with OCR software. Also, many words were improperly transcribed.

To my relief, the topic modeling produced some noticeable results, in spite of picking up the fragmented words. However, searching the pdf files for multiple words at a time was unfruitful as the search engine proved to be programmed to search page-by-page instead of by multiple pages at once.

At this point I realized I had approached the project with the underlying assumption that the methods I employed here would be utilized by a “rogue digital historian,” if you will: one who takes the initiative to download, transcribe, model, search, and read the sources. I concluded that perhaps it would be better in the future to use my methods in the context of a digital project in its own right, such as crowdsourcing is done. I am beginning to be familiar with programs such as Dedoose, while working as a research assistant, which allow users to annotate and demarcate topics in texts. I think it would be beneficial for all historians to make digitally annotated sources available so that they could search quickly and efficiently for the subject matter they need. It might also be a useful way to reduce the opposition between more traditional approaches to history writing and the digital humanities, whose methods by and large seem to be conceived of as opposed to one another in some way.

On a more practical level, I’d certainly be curious to know if there is a way to quickly edit transcribed text documents, and if there’s a way to program “Ngrams” into Mallet for topic modeling.

KMW regular expressions

Finding all words that contain a K, M, and a W: there are a few ways to do this, but one is to use the so-called “lookahead” operator.


That gives:


Literacy in Digital Humanities

It is becoming clear to me that the tools used by the digital humanities are not only valuable for historical research but also necessary for historians to accurately interpret a growing set of secondary sources. Sources such as William G. Thomas, III, and Edward L. Ayers’ “The Differences Slavery Made: Two Communities in the American Civil War,” a digital article double-blind peer reviewed and published by the American Historical Review require some degree of digital literacy to accurately read and understand. This source is being required Graduate and undergraduate courses on the Civil War and students are being required to read, understand and interpret this source. In a discussion on the creation of this article, William G. Thomas, III, discussed that the digital format resulted in a confusion of the argument where the authors felt the format would allow agency to the reader. The original navigation was a schematic diagram of the “’multidimensional’,—multi-sequential, multithreaded, flexible, modular, component, high articulation, high definition, dynamic” technique the article possessed.


This diagram proved more confusing than useful and was abandoned in favor of a more linear text-based navigation.


Many of the 7 Scholars chosen to decide on the best navigation were unfamiliar with digital publications.

Another example I have been thinking about concerning the importance of digital literacy for Historians comes with the growing popularity of topic modeling which was the focus of the winter 2012 Journal of Digital Humanities. The articles in this journal explain that “topic modeling algorithms perform what is called probabilistic inference. Given a collection of texts, they reverse the imaginary generative process to answer the question ‘What is the likely hidden topical structure that generated my observed documents?’’”(David M. Blei. “Topic Modeling and Digital Humanities”) Lisa M. Rhody in “Topic Modeling and Figurative Language” discusses the assumptions of the Latent Dirichlet Allocation (LDA) algorithm for topic modeling. Without some understanding of these algorithms, it is easy for a historian to misinterpret or give excessive value to the results of the topics produced.

Just as a basic literacy in statistics and graphs are required to understand and accurately interpret graphs in historical works, literacy in digital interpretation is becoming a requirement. These two examples demonstrate to me that as digital publications and publications using digital material become more common in academic writing, historians are being required to become fluent in digital humanities.

Credit Transparency and the Collaborator’s Bill of Rights

I want to take this opportunity to expand a bit on what Abigail posted regarding collaboration on Digital Projects. In particular, I want to draw your attention to and work through the provisions of the “Collaborator’s Bill of Rights,” which is part of a larger report entitled “Off the Tracks: Laying New Lines for Digital Humanities Scholars.” This report was the result of a workshop on professionalization and ethics in digital humanities centers in January 20-21, 2010. According to the report, “This workshop addressed the rapidly emerging phenomenon of alternative academic careers among the hybrid scholar-programmers now staffing many DH centers,” particularly focusing the murkiness of potential professional development and career trajectories for those working on digital projects and utilizing digital methods. Hosted by Tanya Clement and Doug Reside, this workshop brought together many prominent digital humanists to discuss, write down, and possibly clear up some of this murkiness. One important section was on Collaboration, and the “Collaborator’s Bill of Rights” provides a set of guidelines for collaborating and crediting collaborators on digital projects. I think when discussing publishing, particularly collaborative digital publishing, this document brings up a few issues with collaboration and provides us with an excellent place to begin a discussion. And while this document might not represent a consensus of the entire digital humanities community (if there ever really is one), it does represent the collaborative work of many well known and widely read digital humanities practitioners. Moreover, by considering the four main sections of the “Collaborator’s Bill of Rights” we can explore not only the proper way to organize, document, and credit collaboration, but also look into what the major issues with collaboration seem to be.

The “Collaborator’s Bill of Right’s” is divided into four main sections. The first section states:

1) All kinds of work on a project are equally deserving of credit (though the amount of work and expression of credit may differ). And all collaborators should be empowered to take credit for their work.*

This statement not only reveals a call for fairness in crediting work on digital projects, but also calls your attention to the “all kinds of work on a project,” which reminds us that work on large, collaborative digital humanities projects often involve very different roles. It is very rare for a single individual (for the purposes of this scenario, let’s say an individual scholar) to be responsible for every part of a digital project. Often we require the help of web developers, programmers, librarians, and archivists. This entry, therefore, calls our attention to how a project manager is responsible for crediting each collaborator according to the magnitude and importance of their contributions to a project. In a more lopsided scale of collaboration, one might be able to get away with an “Acknowledgements” page—similar to what we often see in scholarly monographs. However, one needs to be sure to attribute credit to each individual working on the project according to the importance of his or her contribution.

This section also brings up the agency of each collaborator to take credit for his or her work, but it is explored in more detail in the next section:

2) The DH community should default to the most comprehensive model of attribution of credit: credit should take the form of a legible trail that articulates the nature, extent, and dates of the contribution. (Models in the sciences and the arts may be useful.)

Section two is divided into two parts. First, it concerns the DH community as a whole. It says it should emulate the models of credit utilized by the sciences and the arts as a useful starting point. I think makes sense, considering collaborative work appears to be more common in these fields than in the humanities so we might as well borrow from what works for them. The key point here is that maintain a trail of credit that is as detailed and accurate as possible so that everyone receives the credit that they deserve. But the second part of this entry (text not provided in this blog post), I believe is more revealing. It concerns the rights of collaborators in being credited and crediting themselves in “Descriptive Papers & Project Reports,” “Websites,” and on “CVs.” These three subsections demonstrate the right of each collaborator to demand credit appropriate to their contribution. Furthermore, this section does not define what makes one contribution more worthy of credit than another. Instead it leaves it up to the Collaborators to determine credit value. This is important because then each project can make an honest appraisal of what kind of credit different types of work deserves. At the same time, the “CVs” section gives the individual collaborator the right to “express their contributions honestly and comprehensively” on their Curriculum Vitae. Overall, these first two sections stress a need for a transparency regarding the collaboration process and workload while maintaining academic honesty and integrity from both the project as a whole and the individual collaborators.

The third section of the Bill of Rights concerns problems with credit and access to a project that has a particular institutional support:

3) Universities, museums, libraries, and archives are locations of creativity and innovation. Intellectual property policies should be equally applied to all employees regardless of employment status. Credit for collaborative work should be portable and legible. Collaborators should retain access to the work of the collaboration.

This brings up another issue with digital project. Since digital projects are often funded by and linked to a particular institution or group of institutions, issues of intellectual property often come up when a collaborator (by choice, or by necessity) moves from a participating institution on a project to another. Again this section stresses an ethical consideration for collaborators. Even if they leave the institution supporting a digital project, their contribution should still be recognized or credited and they should not be “locked out” of further developments in the project. I think this section is of vital importance for a collaborator, because if the digital humanists writing this report felt the need to address this, than it obviously has been a problem in the past. This entry follows a key theme of the rest of the Bill of Rights—everyone should be accountable and credited for his or her contribution to a project, regardless of his or her role (technical, research, metadata, etc…) and institutional affiliation. The key principle here is for each collaborator not only to receive the credit they deserve, but also to allow collaborators appropriate access to their own work.

The final section looks into an important part of any digital project, funding. Recognizing that those who fund digital projects often have a significant influence in the vision and purpose of a digital project, the Bill of Rights states:

4) Funders should take an aggressive stance on unfair institutional policies that undermine the principles of this bill of rights. Such policies may include inequities in intellectual property rights or the inability of certain classes of employees to serve as PIs.

It again stresses ethical consideration of and credit for individual’s contributions to a project, but looking to funders as a top-down way of making sure this happens. I understand this as a last chance approach. If individuals working on a project fail to accurately and ethically attribute credit to collaborators, and institutional policies also interfere with transparency in crediting contributions, then it is the role of the funders to step in and make sure it is done correctly. I think this might be a bit naïve, especially since the very institutions that are preventing this transparency of credit are funding digital projects. However, I do think it is important to recognize the vital oversight role funders play on digital projects and to encourage them to use that role ethically and responsibly. Overall, these principles encourage open and honest collaboration at the various levels of scale of a digital project. From individual collaborators to project managers, and institutions and funding sources, the “Collaborator’s Bill of Rights” tries to create an open and ethical environment where digital projects can flourish through effective collaboration—without each collaborator worrying that they will not receive credit for their work.

Now I know this did not directly have too much to do with digital publishing, but I feel these principles on collaboration are crucial to understand in order to begin conceiving a collaborative digital project. Furthermore, I think having a plan for collaboration and crediting contributions allows for more effective and creative digital products.


*All quotes are taken directly from the “Collaborator’s Bill of Rights”.

Physical to Digital (and Everything in Between)

The focus of the Summer 2012 issue of the Journal of Digital Humanities was the process of translating analogue materials into the digital world, and the possibilities for greater understanding resulting from the shift in medium. The editors, Dan Cohen and Joan Troyano, in their introduction “The Difference the Digital Makes,” point out that despite the primary focus on “the final product” displayed on the web “…we remain cognizant of this transition that artifacts of human expression have taken.” Delving into this transition then sets the stage for the potential scope of new projects.

This issue relates to some of the discussions we had earlier this semester in the “making things digital” section of the class, where we contributed our own efforts to this transition. Analyzing how our participation fit into the greater scope of digital material, we fixated on detailing how we performed our process. The focus on perspective gained during creation part of Craig Mod’s article “The Digital-Physical,” which explores the possibility of giving a framework to “…our journeys that live largely in digital space.” Creating these “edges” for digital productions underlines a reciprocal relationship between digital and physical, where understanding of the whole system is derived in the movement from one form to the other.

In 1999, it was evident what the difference the digital was making to Edward L. Ayers, noting archives transitioning into the digital: “These projects…create capacious spaces in which users make connections and discoveries for themselves. Such archives take advantage of the mass, multiplicity, speed, reiteration, reflexivity, and precision offered by computers.” Translating and manipulating analogue material is not only about translating the text into the physical, but using digital tools to manipulate the complete physicality of an item. Sarah Werner, in “Where Material Book Culture Meets Digital Humanities,” looks into the widely accessible body of digital texts available online and focuses on both the quality of current holdings and the possibilities for future scholarly insights. Essentially, she points out the flaws created in translation from analogue to digital (quality of digital imaging and reproduction), but shifts to textual manipulations capable only with digital tools (multi-spectral imaging, densitometers, smell analysis, and virtual physical manipulations).The final analysis here was to “not limit ourselves to reading the digital in the same ways we’ve always been reading.” For Werner, reading text digitally includes much more than simply accessing and comprehending words on a page available online. Other authors writing in this issue also describe their own journeys between digital and physical materials through production (Booker) and communication (Terras), and the insights gained from each stage of this journey. Emerging at the end is a sense of the scratched surface of further digital re-imaginings of physical material.

Collaboration in Digital Projects

Collaboration in the digital humanities manifests itself not just among those involved in creating digital scholarship but also with the audience. Through comments and feedback sections, social networks, and other mediums, historians are able to engage with their audience on an unprecedented level. Digital humanists continue to navigate these complicated relationships in every digital project they create. Unlike the strict rules that exist regarding publication in the academy, I am not convinced that the same stringent rules will ever apply to digital works because of the vast and very diverse nature of digital scholarship. It is important to separate “traditional” work from the digital. As William G. Thomas states, “We historians might have to cast aside our illusions of permanence and our penchant for the “cardigan.” If we experiment, however, we might discover that the openness of the digital medium is what allows us both to create vibrant new scholarship and to speak to a rising generation of students.” This is demonstrated in the articles included in the most recent Journal of the Digital Humanities, which focused on “Expanding Communities of Practice.” As with the authors of “The Difference Slavery Made,” the authors of these articles balanced “critical theory, the needs of the projects’ constituents, and the mixed opportunities and constraints presented by a respective technology” in creating digital projects. The creators of “The Difference Slavery Made” also detailed their struggle with the imposition of traditional historical practices on their “digital article.” I chose to highlight two of the articles from this issue, given their relevance to this week’s readings.

In the first article titled “Changing Medium, Transforming Composition” by Trey Conatser, he discussed the importance of the visibility of the web in creating digital scholarship. This is an idea we have discussed in our class, particularly with the article, “Hermenuetics of Data and Historical Writing.” The digital world allows for greater collaboration between “the supply” and “the demand” as it was directed in Dan Cohen’s article, “The Social Contract of Digital Publishing.” Conatser found that by making the methods behind their work visible to others, his students (who were learning XML) were “empowered to take command of their work.” The students also came to realize that digital scholarship is about both the argument and the form, another idea that was discussed in creating “The Difference Slavery Made” as well as in several of the other articles we’ve read in class.

In another article titled, “Media NOLA: A Digital Humanities Project to Tell Stories of Cultural Production in New Orleans,” Vicki Mayer and Mike Griffith detail the experience of creating a website showing the many contributions in the creation of New Orleans culture. This work is mostly done through Tulane University. Mayer and Griffith discuss at great length the collaboration between students and faculty of different disciplines, cultural institutions in New Orleans, as well as using social media to get feedback from users. They also learned the importance of aesthetics in creating a web platform once again showing how, as with “The Difference Slavery Made, digital projects require the combination both argument and form throughout the planning process, a factor that separates them from traditional historical methods.  The other articles in this issue all worked to create the same general idea, that greater collaboration in the digital humanities between both authors and audience, as well as between scholars from interdisciplinary fields help the field to grow and advance in unique and exciting ways.