Monthly Archives: September 2013

Transparency & Selectivity

In light of the issues concerning Time on the Cross and the marked lack of transparency constructed around the authors’ methodology, one aspect of data accessibility stuck with me as an articulation expanding on the qualifications we discussed in class. Bringing methodology to the foreground of historical writing, as described by Gibbs & Owens in the Hermeneutics of Data and Historical Writing, integrates the communication of processes with the findings. This serves to incorporate data and quantitative analysis with traditional uses of historical sources in a manner that assigns each a function appropriate to its scope, which the authors describe when they note that “as historical data becomes more ubiquitous, humanists will find it useful to pivot between distant and close readings”. Distant readings call for innovative methodologies and collaborations, as they also often serve as the mechanism for framing questions and directing attention to previously imperceptible patterns and trends, which can not be separated from the findings themselves. Gibbs & Owens also note that “historians need to treat data as text”, which seems to summarize the dichotomy between text and data though an analogy of a false separation. Both are treated similarly already: acquired, manipulated, analyzed and represented. This broader, more exploratory approach to data highlights also the fundamental difference from purely mathematical hypothesis testing, which ignores the subjectivity of historical information.

Fogel and Engerman certainly generated a fair amount of discourse through their own procedure, however, as Gibbs demonstrates through Owens’ own research, the type of additive commentary that occurs during a project where the methodology is laid bare allows researchers greater access to critique, commentary, expansion, and inspiration. Where the scope of and access to data is expanding, a greater weight is given to a researcher’s methodology than previously and the interaction is part of the interpretation. However, as Robert Darton describes in his article, access remains tied to money and power, and there is an ever shifting balance between private and public interests. As information is also treated as an asset, this valuation of data seems to guarantee conclusions warped to reflect the selectivity of available material. Darton references the Enlightenment thinkers to frame the disconnect between accessibility and privilege, where ideals fail to reflect reality. This seems to also be applicable to the discussion Gibbs & Owen initiate. Through their emphasis on transparent methodology, there might be a window for corrective forces for the bias of data filtered by private interests. Though Darton has faith in Google as a benign force, it may be that the vigilance he calls for is most appropriately generated when analyzing these works made denser through the additional scope of methodology, in whatever form that ends up taking.




Adding Documents for Marxists Internet Archive

For my first real venture into Digital History I chose a relatively simple and practical endeavor. I am digitizing a 10 page discussion titled “What is Black Consciousness?” between Steve Biko, a South African Revolutionary during apartheid, and a white South African court during Biko’s trial. This is a piece of a larger work of Biko’s writing, speeches, and other primary source material smuggled out of South Africa after his death due to a head wound suffered while in police custody in 1977. While Biko died before the end of Apartheid, he was instrumental in the revolutionary movement as well as his death was a major turning point in international attention to the Apartheid oppression. I will include a sample of this document. “We come from a background which is essentially peasant and worker, we do not have any form of daily contact with a highly technological society, we are foreigners in that field. When you have got to write an essay as a black child under for instance JMB the topics that are given there tally very well with white experience, but you as a black student writing the same essay have got to grapple with something which is foreign to you — not only foreign but superior in a sense; because of the ability of the white culture to solve so many problems in the sphere of medicine, various spheres, you tend to look at it as a superior culture than yours, you tend to despise the worker culture, and this inculcates in the black man a sense of self-hatred which I think is an important determining factor in his dealings with himself and his life.” (BIko, 1976) or Marxists Internet Archive (MIA) has been an invaluable resource for me in researching international  revolutionary movements. The Marxists Internet Archive is an all-volunteer, non-profit public library, started more than 20 years ago in 1990. MIA abides by seven fundamental tenets found in their charter.  They promise to always be 100% Free; to always be a non-profit organization; to always be based on democratic decision making; to always have full disclosure; to always remain politically independent; Their priority is to provide archival information and to present content in a way that is easy to access and understand. For more information:

I have signed up as a volunteer with MIA and will be working on three or more digitizing projects in the next six months. This is very interesting and helpful to me as I will be working with these type of sources on a regular basis. I have  used this site extensively as it is often the only location where I can access a primary source that was originally written in a language I cannot read, or can only be physically found in one or two libraries internationally. This site is often the first place that a source has been digitized or translated.

I must acknowledge that this site has its own limitations. First, it contains only a sampling of all sources that could be published and is what Digital Humanities call an Intentional Archive. Second, MIA is compiled by volunteers who make mistakes, unlike Wikipedia, this is a collection of primary sources and the mistakes are more in the way of grammatical errors and translation mistakes. A few of the very valuable aspects of this archive is that it is compiled by people in many countries in the world, speaking many different languages. This makes it accessible not only to Western historians with limited language abilities but also internationally in more than 45 different languages. One of my biggest concerns about digital history is access. If history is being digitized and perhaps more accessible to western countries, is it also becoming less accessible to people reading non-western languages? MIA is one of the few Digital Archives that has made such a focus on diverse language accessibility.


Explaining Digital Tools and Methods in History Writing

According to Gibbs and Owens in “Hermeneutics of Data and Historical Writing” new digital methods of data collection, analysis, and display require a new “level of methodological transparency.” They advocate an open documentation and presentation of the process of working with data. This is not only to inform readers of how they reached their conclusions, but also to familiarize readers with the different ways one can use data for historical research and analysis. Gibbs and Owens state, “We need to teach each other how we are using and making sense of data.” And Gibbs and Owens use “data” in a much broader sense. To them, data is not synonymous with evidence. Digital historians and humanists work with data not only as a confirmatory exercise, but can also use digital tools and methods as a means of discovering and framing new research questions. The mere availability of certain data sets, and the tools for interpreting them, opens up exciting new options for historical inquiry.  Stephen Ramsay calls this the “hermeneutics of screwing around,”  which include using digital tools to formulate research questions and creative failures that steer your research or analysis in a particular direction.

Gibbs and Owens, however, call for an open and available documenting of the process of using and analyzing data, even these initial steps of discovery and creative failure.  I think this open documenting is important and useful. It not only allows your readers to understand how you collected, used, analyzed, and manipulated your data, but also serves as a ways for you to familiarize your audience (particularly your non-digital colleagues) with using these new tools and data sets.

But I cannot help but wonder what this transparency will look like.  Let us assume that someone is trying to publish a traditional monograph while being as transparent as Gibbs and Owens are suggesting in their piece. Will it be in the form of an exhaustive and detailed introduction? That might discourage readers from looking at the rest of your work. What about if it were included at the end of the monograph in the form of an Appendix (or appendices)? That might discourage readers from even reading the section. I know from personal experience that Appendices are often skimmed over, if not ignored entirely. What about blogging about the process of researching and writing your monograph? This would allow you to avoid the first two problems, but by separating it from the monograph you risk having your reader’s not be aware of or have access to your blog. It would have to be explicitly stated in the monograph, and, even then, you cannot guarantee your readers will check out your site. The most effective way of integrating this transparency into your text might be to present your monograph in a digital format, such as Gibbs and Owens’ chapter, layering your methodology and process through a series of visualizations, hyperlinks, and other pages. But even that has its drawbacks. In academia, where peer-review and publishing still play such a significant role in hiring and tenure decision, can someone other than a tenured professor risk presenting their entire work online? Even then, would they?

Now I do not have any “answers” to this issue, but I think it is useful for anyone considering doing digital work to think clearly about how you are going to represent your research and analysis in the most effective way. Maybe an exhaustive introduction of digital work could work out best. Maybe it is best decided on a per-project basis. Or maybe one might consider a combination of these strategies (i.e. a presenting both a digital and print format, or including a digital companion to a hard copy work). If you are trying for the kind of transparency that Gibbs and Owens are suggesting, these are issues you must confront.

Rectifying Maps for the NYPL

For this week’s “making things digital” class, I decided to do something a little different than digitizing text. When I saw Ben’s post of suggestions, I was immediately drawn to the last option: Rectify Maps for the New York Public Libraries. I had done a bit of basic GIS before and was interested that they had an in-site rectifying tool rather than requiring complex and expensive GIS software.

I went to the site, watched their video tutorial (not the best quality video, but it told me exactly what I needed to know), and decided to start giving it a try. Rectifying a map involves three main aspects: the historical map, the base map, and your control points. In order to rectify a map, the user places control points on similar locations on both the historical map and the base map. These control points are paired to each other. By carefully placing enough of these control points, the user can manipulate the historical map to match up with the modern base map.

The next step was to choose what kind of maps I wanted to rectify. I wanted to choose a place and scale I was familiar with so I started searching for historical maps of my home state, New Jersey. I found two maps that I found very interesting and began working on rectifying them. One is a 1795 engraving of New Jersey by Joseph Scott of The United States Gazetteer  (Philadelphia) and the other is a 1873 map of New Jersey from the Atlas of Monmouth co., New Jersey. Here are images of the historical maps before rectifying them:

NJ Map 1795

NJ Map 1795 (before)

NJ Map 1873 (some control points shown)

NJ Map 1873 (before)

I have decided to include some of my control points into the 1873 map so that you can see what they look like. In order to properly rectify a map, you must have more than one control point. The NYPL site requires that you have at least three control points in order to rectify the historical map with the base map. Also, the warper includes a mechanism that determines how off (margin of error) each of your control points are between your historical map and your base map. The tutorial video instructs you to make sure each of your control points have a margin of error of less than 10. Going into this, I assumed that  more control points linking my historical map to my base map would result in a the more accurate rectified map. However, this is only if you can get your control points under that margin of error of 10. Also, adding more control points can often distort the margin of error for your other control points. So it is not always best to have the greatest number of control points, but instead one should place control points in optimal positions yielding the least margin of error. Each map is also unique, so you need to find out what you think the best arrangement and number of control points are. I am not saying that my rectified maps are perfect (they are far from it), but I found that around six control points did the trick.

After placing these control points, I cropped the historical map a bit so that it would fit better on the base map, then I clicked “Warp Image!,” then played around with the transparency settings of the historical map in order to produce these new rectified maps:

NJ Map 1795 (after)

NJ Map 1795 (after)

NJ Map 1873 (after)

NJ Map 1873 (after)

Now I will offer a few final thoughts about the process (although I definitely expect to do this again). First, rectifying maps is a frustratingly precise process. Borders, state lines, and towns on the base map are often in very different locations (or non-existent) on the historical map. Also when you are placing control points you have to be constantly aware of not only whether or not your control points line up to the correct location on the historical map and the base map, but also of how each control point affects the margin of error of each other control point you placed. For example, I tried rectifying a map of the United States and was able to place three control points with very little margin of error for each. I placed them at the Northwestern part of Washington, the Southwestern corner of California, and the Southern-most point in Texas. However, no matter where I placed the next control point, the margin of error seemed to skyrocket for all four points as soon as I placed the fourth one. This might have been a problem with the first three points, but it did prompt me to scale down my efforts from maps of the entire United States to New Jersey maps.

There is one last thing that I wanted to comment on, and it deals with the base map. I was thinking about how the entire process of rectifying these maps concerned warping the historical map to fit the base map. This one-way process assumes that the base map is the accuracy standard and all other maps must conform to its scale and borders. I think that this assumption is something that is taken for granted. I understand the need to have a standard map, but could it not also be useful to have the program do the reverse? What if it generated  an overlay of the historical map on the base map AND an overlay of the base map on the historical map? What kind of value would an arrangement like that have? I am not sure, but I think it is something that at least needs to be considered. Also there are many historical maps that contain different information than the base map and are, therefore, incompatible with the rectifying process (although they are still listed on the site). I just wonder that if by placing such confidence in the base map, we are losing important information from the historical map.  I’ll finish this post by showing one of those maps that are listed on the NYPL site but could not possibly be rectified to our modern base map. There are many of them, but this one in particular stuck out as a very valuable and informative map that is completely incongruous with the base map.

1671 Depiction of Floridans

1671 Depiction of Floridans

Transcribing Letters

Trinity University  in Dublin has just started a public humanities project creating a digital archive of crowd sourced letters written around the time of Ireland’s Easter Rising of 1916, called the Letters of 1916. They have complied letters from November 1915 to October 1916 found in institutions such as the National Library and National Archives, and have also issued a call for letters from private collections. They are looking to create a collection ranging over a variety of issues, such as art, politics, WWI, and the Easter Rising.  I’ve registered myself to contribute by transcribing the letters they have on the site so far.

Accessibility in the Digital Humanities

Accessibility is both a positive and a negative force when discussion the possibilities of the Internet age. At the dawn of the Internet, its possibilities, though they seemed endless, were difficult to grasp by those who could have benefitted most. Unintended consequences due to a lack of foresight into the potential of the Internet’s capabilities are seen in nearly every industry. In fact, it is probably most evident in journalism, where newspapers originally made their content free to all users, then hid it behind paywalls, found that these actions had plenty of consequences for print, as well as for born-digital news sources that benefitted from their open access. This is a problem in the humanities and in libraries as a whole, especially regarding open access under copyright, as discussed in the article on Google Books.

In our first week of readings, Cohen and Rosenzweig view this newfound accessibility on the Internet as an advantage for historians because of the ability to reach a wide audience, as well as the fact that this has “zero marginal cost.” “The Internet allows historians to speak to vastly more people in widely dispersed places without really spending more money—an extraordinary development.” They also discuss inaccessibility and the problems that stem from the digital divide in computer ownership, specifically in a global context as well as the problems of monopoly. These arguments tie in directly to our readings this week.

The article on Google Books discusses the desire for open access as being an Enlightenment principle, one that our country was founded on. He places the responsibility of this open access on libraries that missed their chance in the early days of the Internet to make more content available to their users. Google picked up this mantle in 2004 by launching Google Books and facing down copyright lawsuits made by authors and publishers alike. Therefore, Google has the advantage in having control of all digitized copies of books that are put on the web. The author voiced his concerns about payment. Would we see what happened in print journalism or with scholarly journals and libraries happen with Google Books? Would the payments become so steep that libraries would be forced to dedicate large portions of their already-stretched-thin budgets to give their users open access?

Accessibility is also touched upon in The Hermeneutics of Data and Historical Writing. The author here believes that historians need to rethink the nature of historical writing, by de-emphasizing the narrative, and giving greater access to their data-based methodology. He sees this as a way to break down interdisciplinary walls as well as walls between the researcher and their audience. By changing the format of historical writing to allow for these “twentieth century footnotes,” we would see a greater understanding not only in our field, but also in others, of how to use newly available data to become more accessible. Not only would their work become more “user-friendly” but it would also encourage more historians to think outside of the more linear and traditional ways of using data in historical work.

The issue of accessibility is not going to disappear overnight. In fact, on Monday articles, such as this one from Forbes, appeared about the continuing legal battle over Google Books and its right to digitally share books with all users. By making more content available and their methodology more transparent, historians and all those practicing in the digital world can find ways around the unintended consequences of an open Internet.

The Potential of Cliometrics

Anyone who reads Robert Fogel and Stanley Engerman’s “Time on the Cross” knows immediately that its claims were bound to cause controversy. A significant part of this controversy, no doubt, stems from the heavy use of numerical data in the analysis of the institution of slavery. The numbers seem cold and barely capture the macabre-colored picture of slavery that we are used to encountering in more traditional, humanistic expositions of slavery. Worse still, at times Fogel and Engerman’s language seems to allude to the “Uncle Tom” image of a pitifully subservient and obedient black when describing the typical slave. The authors did not mean to suggest this (they say they admire black achievement under the adversity of white overlordship), one cannot help but to conjure the image when they speak, for example, of the supposed motivation of slaves to be appointed to “better” roles on the plantation.

Despite the controversy, I think it’s a shame that this study may have caused cliometrics to fade completely into the background of historical research, because it offers some useful tools for historians. In particular, I thought its capabilities as a tool for comparative studies were particularly strong. One relatively non-controversial section of “Time on the Cross” was the first chapter, where Fogel and Engerman discuss some of the differences between slavery in the United States and in the Caribbean. They use comparisons of slave imports into the Caribbean and the U.S., foreign-born slaves with the rest of the U.S. population, and the growth of the actual slave populations in the U.S. and Caribbean to expose very real differences between the slave trades of the U.S. and the Caribbean that are in fact made more explicit numerically. The reader gets a harrowing portrait of slaves being sent to the Caribbean in droves to replace those who have succumbed to tropical diseases, while in the U.S., the slave population became “naturalized,” creating a potentially different dynamic to be further explored by historical study.

Steven Ruggles’ “The Transformation of the American Family Structure” is another example of the use of quantitative comparisons to show intriguing facts. Some scholars claim that the traditional family structure never existed, we learn. Yet Ruggles suggests that although these “extended households” might have been a minority, they were still an ideal that served to direct behavior more often than not. By using life expectancy to calculate a potential percentage of families that could have had the traditional structure of elderly kin living with younger generations, Ruggles shows that a high percentage of those families that could have this structure actually did. By contrast, life expectancy has risen in the 20th century, and yet the traditional family structure is found even less often.

Comparative quantitative studies offer one way to make meaning out of numbers in a way that is detailed and exciting much in the same way as first-hand accounts provide meaningful qualitative data. It would be a shame to push them aside completely.


As I said in class, this week before we meet you should take some time to participate in a crowdsourcing project to see how some institutions are digitizing their content. Everyone should take a different one so that we can compare notes about the possibilities and pitfalls of this sort of thing. You’ll probably be happiest if you can find something that maps against your interests (try googling “Crowdsourced ___ history” or something as a last result to find projects.

Spend enough time to make a contribution to the archive, but also browse around and be ready to report to the rest of us how well the project is working, what sort of contributions it seems to be getting, and if it’s a model extensible to other projects. Would you be able to apply these methods to a project yourself? Could you go about digitizing your own research artifacts in these same ways?

Some possibilities:

DIYHistory | Transcribing Cookbooks

The University of Iowa Library has an area for crowdsourcing history by helping to transcribe digitized, handwritten Szathmary Culinary Manuscripts and Cookbooks. By helping to transcribe them, or by checking transcriptions made by other users, it helps to make them full-text searchable and therefore more easily accessible to researchers and to the public. Clearly, this is important to some one like me, who is writing a paper on 19th century cooking in the United States. I have created an account and have already taken a look at a few of the cookbooks available to transcribe, based on my chosen time period of study.

Find the site here.