Category Archives: Uncategorized

Explaining Digital Tools and Methods in History Writing

According to Gibbs and Owens in “Hermeneutics of Data and Historical Writing” new digital methods of data collection, analysis, and display require a new “level of methodological transparency.” They advocate an open documentation and presentation of the process of working with data. This is not only to inform readers of how they reached their conclusions, but also to familiarize readers with the different ways one can use data for historical research and analysis. Gibbs and Owens state, “We need to teach each other how we are using and making sense of data.” And Gibbs and Owens use “data” in a much broader sense. To them, data is not synonymous with evidence. Digital historians and humanists work with data not only as a confirmatory exercise, but can also use digital tools and methods as a means of discovering and framing new research questions. The mere availability of certain data sets, and the tools for interpreting them, opens up exciting new options for historical inquiry.  Stephen Ramsay calls this the “hermeneutics of screwing around,”  which include using digital tools to formulate research questions and creative failures that steer your research or analysis in a particular direction.

Gibbs and Owens, however, call for an open and available documenting of the process of using and analyzing data, even these initial steps of discovery and creative failure.  I think this open documenting is important and useful. It not only allows your readers to understand how you collected, used, analyzed, and manipulated your data, but also serves as a ways for you to familiarize your audience (particularly your non-digital colleagues) with using these new tools and data sets.

But I cannot help but wonder what this transparency will look like.  Let us assume that someone is trying to publish a traditional monograph while being as transparent as Gibbs and Owens are suggesting in their piece. Will it be in the form of an exhaustive and detailed introduction? That might discourage readers from looking at the rest of your work. What about if it were included at the end of the monograph in the form of an Appendix (or appendices)? That might discourage readers from even reading the section. I know from personal experience that Appendices are often skimmed over, if not ignored entirely. What about blogging about the process of researching and writing your monograph? This would allow you to avoid the first two problems, but by separating it from the monograph you risk having your reader’s not be aware of or have access to your blog. It would have to be explicitly stated in the monograph, and, even then, you cannot guarantee your readers will check out your site. The most effective way of integrating this transparency into your text might be to present your monograph in a digital format, such as Gibbs and Owens’ chapter, layering your methodology and process through a series of visualizations, hyperlinks, and other pages. But even that has its drawbacks. In academia, where peer-review and publishing still play such a significant role in hiring and tenure decision, can someone other than a tenured professor risk presenting their entire work online? Even then, would they?

Now I do not have any “answers” to this issue, but I think it is useful for anyone considering doing digital work to think clearly about how you are going to represent your research and analysis in the most effective way. Maybe an exhaustive introduction of digital work could work out best. Maybe it is best decided on a per-project basis. Or maybe one might consider a combination of these strategies (i.e. a presenting both a digital and print format, or including a digital companion to a hard copy work). If you are trying for the kind of transparency that Gibbs and Owens are suggesting, these are issues you must confront.

Rectifying Maps for the NYPL

For this week’s “making things digital” class, I decided to do something a little different than digitizing text. When I saw Ben’s post of suggestions, I was immediately drawn to the last option: Rectify Maps for the New York Public Libraries. I had done a bit of basic GIS before and was interested that they had an in-site rectifying tool rather than requiring complex and expensive GIS software.

I went to the site, watched their video tutorial (not the best quality video, but it told me exactly what I needed to know), and decided to start giving it a try. Rectifying a map involves three main aspects: the historical map, the base map, and your control points. In order to rectify a map, the user places control points on similar locations on both the historical map and the base map. These control points are paired to each other. By carefully placing enough of these control points, the user can manipulate the historical map to match up with the modern base map.

The next step was to choose what kind of maps I wanted to rectify. I wanted to choose a place and scale I was familiar with so I started searching for historical maps of my home state, New Jersey. I found two maps that I found very interesting and began working on rectifying them. One is a 1795 engraving of New Jersey by Joseph Scott of The United States Gazetteer  (Philadelphia) and the other is a 1873 map of New Jersey from the Atlas of Monmouth co., New Jersey. Here are images of the historical maps before rectifying them:

NJ Map 1795

NJ Map 1795 (before)

NJ Map 1873 (some control points shown)

NJ Map 1873 (before)

I have decided to include some of my control points into the 1873 map so that you can see what they look like. In order to properly rectify a map, you must have more than one control point. The NYPL site requires that you have at least three control points in order to rectify the historical map with the base map. Also, the warper includes a mechanism that determines how off (margin of error) each of your control points are between your historical map and your base map. The tutorial video instructs you to make sure each of your control points have a margin of error of less than 10. Going into this, I assumed that  more control points linking my historical map to my base map would result in a the more accurate rectified map. However, this is only if you can get your control points under that margin of error of 10. Also, adding more control points can often distort the margin of error for your other control points. So it is not always best to have the greatest number of control points, but instead one should place control points in optimal positions yielding the least margin of error. Each map is also unique, so you need to find out what you think the best arrangement and number of control points are. I am not saying that my rectified maps are perfect (they are far from it), but I found that around six control points did the trick.

After placing these control points, I cropped the historical map a bit so that it would fit better on the base map, then I clicked “Warp Image!,” then played around with the transparency settings of the historical map in order to produce these new rectified maps:

NJ Map 1795 (after)

NJ Map 1795 (after)

NJ Map 1873 (after)

NJ Map 1873 (after)

Now I will offer a few final thoughts about the process (although I definitely expect to do this again). First, rectifying maps is a frustratingly precise process. Borders, state lines, and towns on the base map are often in very different locations (or non-existent) on the historical map. Also when you are placing control points you have to be constantly aware of not only whether or not your control points line up to the correct location on the historical map and the base map, but also of how each control point affects the margin of error of each other control point you placed. For example, I tried rectifying a map of the United States and was able to place three control points with very little margin of error for each. I placed them at the Northwestern part of Washington, the Southwestern corner of California, and the Southern-most point in Texas. However, no matter where I placed the next control point, the margin of error seemed to skyrocket for all four points as soon as I placed the fourth one. This might have been a problem with the first three points, but it did prompt me to scale down my efforts from maps of the entire United States to New Jersey maps.

There is one last thing that I wanted to comment on, and it deals with the base map. I was thinking about how the entire process of rectifying these maps concerned warping the historical map to fit the base map. This one-way process assumes that the base map is the accuracy standard and all other maps must conform to its scale and borders. I think that this assumption is something that is taken for granted. I understand the need to have a standard map, but could it not also be useful to have the program do the reverse? What if it generated  an overlay of the historical map on the base map AND an overlay of the base map on the historical map? What kind of value would an arrangement like that have? I am not sure, but I think it is something that at least needs to be considered. Also there are many historical maps that contain different information than the base map and are, therefore, incompatible with the rectifying process (although they are still listed on the site). I just wonder that if by placing such confidence in the base map, we are losing important information from the historical map.  I’ll finish this post by showing one of those maps that are listed on the NYPL site but could not possibly be rectified to our modern base map. There are many of them, but this one in particular stuck out as a very valuable and informative map that is completely incongruous with the base map.

1671 Depiction of Floridans

1671 Depiction of Floridans

Transcribing Letters

Trinity University  in Dublin has just started a public humanities project creating a digital archive of crowd sourced letters written around the time of Ireland’s Easter Rising of 1916, called the Letters of 1916. They have complied letters from November 1915 to October 1916 found in institutions such as the National Library and National Archives, and have also issued a call for letters from private collections. They are looking to create a collection ranging over a variety of issues, such as art, politics, WWI, and the Easter Rising.  I’ve registered myself to contribute by transcribing the letters they have on the site so far.

Accessibility in the Digital Humanities

Accessibility is both a positive and a negative force when discussion the possibilities of the Internet age. At the dawn of the Internet, its possibilities, though they seemed endless, were difficult to grasp by those who could have benefitted most. Unintended consequences due to a lack of foresight into the potential of the Internet’s capabilities are seen in nearly every industry. In fact, it is probably most evident in journalism, where newspapers originally made their content free to all users, then hid it behind paywalls, found that these actions had plenty of consequences for print, as well as for born-digital news sources that benefitted from their open access. This is a problem in the humanities and in libraries as a whole, especially regarding open access under copyright, as discussed in the article on Google Books.

In our first week of readings, Cohen and Rosenzweig view this newfound accessibility on the Internet as an advantage for historians because of the ability to reach a wide audience, as well as the fact that this has “zero marginal cost.” “The Internet allows historians to speak to vastly more people in widely dispersed places without really spending more money—an extraordinary development.” They also discuss inaccessibility and the problems that stem from the digital divide in computer ownership, specifically in a global context as well as the problems of monopoly. These arguments tie in directly to our readings this week.

The article on Google Books discusses the desire for open access as being an Enlightenment principle, one that our country was founded on. He places the responsibility of this open access on libraries that missed their chance in the early days of the Internet to make more content available to their users. Google picked up this mantle in 2004 by launching Google Books and facing down copyright lawsuits made by authors and publishers alike. Therefore, Google has the advantage in having control of all digitized copies of books that are put on the web. The author voiced his concerns about payment. Would we see what happened in print journalism or with scholarly journals and libraries happen with Google Books? Would the payments become so steep that libraries would be forced to dedicate large portions of their already-stretched-thin budgets to give their users open access?

Accessibility is also touched upon in The Hermeneutics of Data and Historical Writing. The author here believes that historians need to rethink the nature of historical writing, by de-emphasizing the narrative, and giving greater access to their data-based methodology. He sees this as a way to break down interdisciplinary walls as well as walls between the researcher and their audience. By changing the format of historical writing to allow for these “twentieth century footnotes,” we would see a greater understanding not only in our field, but also in others, of how to use newly available data to become more accessible. Not only would their work become more “user-friendly” but it would also encourage more historians to think outside of the more linear and traditional ways of using data in historical work.

The issue of accessibility is not going to disappear overnight. In fact, on Monday articles, such as this one from Forbes, appeared about the continuing legal battle over Google Books and its right to digitally share books with all users. By making more content available and their methodology more transparent, historians and all those practicing in the digital world can find ways around the unintended consequences of an open Internet.

The Potential of Cliometrics

Anyone who reads Robert Fogel and Stanley Engerman’s “Time on the Cross” knows immediately that its claims were bound to cause controversy. A significant part of this controversy, no doubt, stems from the heavy use of numerical data in the analysis of the institution of slavery. The numbers seem cold and barely capture the macabre-colored picture of slavery that we are used to encountering in more traditional, humanistic expositions of slavery. Worse still, at times Fogel and Engerman’s language seems to allude to the “Uncle Tom” image of a pitifully subservient and obedient black when describing the typical slave. The authors did not mean to suggest this (they say they admire black achievement under the adversity of white overlordship), one cannot help but to conjure the image when they speak, for example, of the supposed motivation of slaves to be appointed to “better” roles on the plantation.

Despite the controversy, I think it’s a shame that this study may have caused cliometrics to fade completely into the background of historical research, because it offers some useful tools for historians. In particular, I thought its capabilities as a tool for comparative studies were particularly strong. One relatively non-controversial section of “Time on the Cross” was the first chapter, where Fogel and Engerman discuss some of the differences between slavery in the United States and in the Caribbean. They use comparisons of slave imports into the Caribbean and the U.S., foreign-born slaves with the rest of the U.S. population, and the growth of the actual slave populations in the U.S. and Caribbean to expose very real differences between the slave trades of the U.S. and the Caribbean that are in fact made more explicit numerically. The reader gets a harrowing portrait of slaves being sent to the Caribbean in droves to replace those who have succumbed to tropical diseases, while in the U.S., the slave population became “naturalized,” creating a potentially different dynamic to be further explored by historical study.

Steven Ruggles’ “The Transformation of the American Family Structure” is another example of the use of quantitative comparisons to show intriguing facts. Some scholars claim that the traditional family structure never existed, we learn. Yet Ruggles suggests that although these “extended households” might have been a minority, they were still an ideal that served to direct behavior more often than not. By using life expectancy to calculate a potential percentage of families that could have had the traditional structure of elderly kin living with younger generations, Ruggles shows that a high percentage of those families that could have this structure actually did. By contrast, life expectancy has risen in the 20th century, and yet the traditional family structure is found even less often.

Comparative quantitative studies offer one way to make meaning out of numbers in a way that is detailed and exciting much in the same way as first-hand accounts provide meaningful qualitative data. It would be a shame to push them aside completely.

Crowdsourcing

As I said in class, this week before we meet you should take some time to participate in a crowdsourcing project to see how some institutions are digitizing their content. Everyone should take a different one so that we can compare notes about the possibilities and pitfalls of this sort of thing. You’ll probably be happiest if you can find something that maps against your interests (try googling “Crowdsourced ___ history” or something as a last result to find projects.

Spend enough time to make a contribution to the archive, but also browse around and be ready to report to the rest of us how well the project is working, what sort of contributions it seems to be getting, and if it’s a model extensible to other projects. Would you be able to apply these methods to a project yourself? Could you go about digitizing your own research artifacts in these same ways?

Some possibilities:

DIYHistory | Transcribing Cookbooks

The University of Iowa Library has an area for crowdsourcing history by helping to transcribe digitized, handwritten Szathmary Culinary Manuscripts and Cookbooks. By helping to transcribe them, or by checking transcriptions made by other users, it helps to make them full-text searchable and therefore more easily accessible to researchers and to the public. Clearly, this is important to some one like me, who is writing a paper on 19th century cooking in the United States. I have created an account and have already taken a look at a few of the cookbooks available to transcribe, based on my chosen time period of study.

Find the site here.

Barely escaping my own bias…

My perception of Time on the Cross initially formed the inklings of hazy mistrust when I read through facts and figures completely devoid of any citations (before I learned of the material presented in the second volume). The density of the authoritatively delivered information, especially the graphic representations, and explicit conclusions populating the book mostly contributed to an uneasy reading (I probably read most of it by skeptical squinting at the page). Then my continued disturbance with what seemed to be an alarming lack of emotional/humanistic weight given to a incidences of whipping was reinforced in Haskell and Gutman, resonating particularly as the statistics were reevaluated and restated to account for the purpose and significance of such violent corporal punishment. This highlighted what I saw as one of the crippling weaknesses of evaluating history with computational scholarship, which was the decontextualization of the data. Arriving at a such a low rate of whippings per slave per year means nothing when the fact that it happened at all was significant to the population as a whole. Clearly however, Time on the Cross was hardly universally accepted as exemplary work by all cliometricians, due to the gaffes they made specific to their computational analysis. Even after looking past the misrepresentation of the number of whippings as a proportion per slave, “…the figure is too low because it is based on an erroneous count both of the number of slaves Barrow owned and the number of times he whipped them”(Haskell). For a discipline so often based on assumptions and inferences to combat a dearth of actual historical data, the inability to correctly utilize what was available seems like a particularly egregious mistake.

Fogel and Engerman lay the claim that historians were “overly preoccupied” with the destruction of slave families on the auction block, and thus this prevented historians from recognizing the “strength and stability that the black family acquired despite the difficult circumstances of slave life” (52). Regardless of the fact that cliometricians have discredited the assumptions used to arrive at the conclusion that slave masters were unwilling to separate families, I fail to see how one can be overly preoccupied with the destruction families, nor how this precludes the ability to recognize the strength of a family through. Fogel and Engerman err when trying to present achievement under adversity by lessening the scope of the adversity and superficially documenting the achievement (crediting slaves with knowledge in the full range of agricultural activities, from planting to harvesting (41)). This admittedly emotionally motivated response to Fogel and Engerman, as opposed to Ruggles, definitely highlights my bias. It is harder to see the merits in their scholarship when they address such a sensitive topic. However, it is clear that quantitative analysis creates many more opportunities to broaden the scope of historic practices. In Ruggles, where some assumptions are qualified and inferences tested, there is an acknowledgment of the limitations of quantitative data in interpreting human activities: “we many never know if people today care more about their families…” However, synthesizing data from IPUMS with the context of social conditions during the time periods analyzed allowed Ruggles to critique sociological theories and develop the body of scholarship regarding the transformation of family life.

I find that Fogel and Engerman, having admitted to shaping the presentation of their work in order to “popularize” cliometrics, subjected themselves to the same pitfalls that traditional historians, through the “fuzzier” work of interpretation, slip into. It’s here that I agree with Abigail, in that working with numbers hardly frees a researcher from errors or bias both in source material and presentation. In attempting to derive numerical values where there is no historical evidence for them to use in calculations of assumed or inferred relationships, there does not seem to be a drastic distinction in the manner through which both quantitative and traditional historians evaluate material.

Qualifications in Cliometrics/Quantitative History

This weeks readings, while occasionally controversial, have shown how cliometrics, quantitative history and the digital world as a whole have opened up new opportunities for historians to gather and assess vast amounts of data in making conclusions about the past. Websites like IPUMS have made this data even more available for scholars, and as discussed in the Ruggles article, have made it easier to code in demographic conditions as well as census data, making these conclusions even more accurate. However, there is trouble when trying to use the mathematic world to interpret the past.

This trouble is seen in Time on the Cross. When reading this I will say that I felt some of the wording and phrasing seemed a bit sketchy. I had assumed that most of the backlash would come from people who were like me, versed in traditional history and felt that the authors hadn’t considered enough of the stories behind slavery, focusing on numbers instead of human beings. However, like Haskell, I was surprised to find that a lot of the backlash came from the way in which the authors calculated their numbers. In his critique of the work, Haskell not only employs economic theory but also traditional historical methodology. One such example is, “Fogel and Engerman note the total absence of large free labor farms in the South and attribute it to the superior efficiency of slave labor. But it might just as well be attributed to the shortage of free laborers and the ideological opposition that slaveowners no doubt would have mounted against an alternative mode of production…” Clearly, cliometrics has its merits in the ability to disseminate large amounts of information in a clear and less time-consuming manner, but when taken out of context, it can bear very little meaning in the study of history.

In an example of using quantitative history in the context of the time period, the IPUMS website linked to a New York Times Article from April 2, 2012 that showed how newly digitized census data has proven that the Civil War death toll was about 20% higher than the numbers historians had been reporting for over a century. By factoring in information such as the type of health care available in the Union and in the Confederacy, high immigrant presence in the armies, and female death rates. However, Dr. Hacker, who came up with the new figures, admits that much of these numbers are based on estimates and assumptions that keep his data from being completely accurate.

My questions from our readings this week stem from the qualifications that each author has made regarding their cliometric and quantitative conclusions. In Time on the Cross, the authors acknowledge that the information they are presenting is controversial, however their contempt for past research into the economics of slavery keeps them from clarifying exactly how much of their data relies on qualifiers and assumptions about the Confederacy. The other authors and researchers make clear that in using data and numbers to make conclusions about the past there tends to be a lot of guesswork, and coming up with an exact, perfect number appears to be nearly impossible. I feel that this connects cliometrics and quantitative history with the traditional study of history in that, when studying primary sources, a historian can never be completely sure that they are getting the entire story. There is always bias or a hidden motive in records of the past, and in that way it appears that no matter if we are using numbers or words, there is always room for error in a historians work.