All posts by decampda

DH Project, Conclusions and Future Research

(Note: This post is the concluding post in my series of posts exploring text analysis, visualizations, and stories about the Boston Marathon from the Our Marathon archive. Each post can be found on my personal blog or by navigating through the table of contents that I have included at the conclusion of this post.)

From these previous posts, I hoped to show a few things. First, I wanted to showcase how using various text analysis platforms, paired with some closer reading of these texts and some manual searches, allows for much richer investigative experience. Word Clouds and Phrase Nets, despite their drawbacks, when used effectively across various text analysis tools allow us to quickly visualize and formulate research questions. Moreover, they can even help us devise some preliminary findings.

Second, I aimed to try to showcase some of the ways in which the Globe Stories were structurally different from the Public Submissions. Sociologist Arthur W. Frank distinguishes between three different types of narratives people formulate when dealing with a trauma in The Wounded Storyteller (1995). Among them are restitution narratives. Restitution narratives consist of a three part structure from healthy to sick, culminating in a hopeful and happy return to health or “normal” (in his case he was looking at those struggling with severe illnesses). From some of the conclusions mentioned in the “Where are the Bombers?: What Can Word Clouds Tell Us?” and “#BostonStrong”, I think we can classify a significant number of stories as following this general structure. Especially in the “Boston Strong” stories, one can see this narrative pattern, particularly the final section stressing a hopeful return to health–but here, health of not only the individuals but also of the city as a whole. The Globe Stories are a bit harder to classify, and perhaps this is due to their brevity. These stories, on the whole, are shorter than the their Public Submission counterparts. Many of them appear to more like abstracts than full stories, which could account for some of the differences between the two sub-corpora.

These posts have shown me a lot about what kinds of questions and conclusions you can derive from analyzing text. However, I do have some criticisms of my own work. First, which I might have discovered a little too late, I do not think my corpus was substantial enough to draw definitive conclusions about these stories. The Our Marathon archive has been collecting stories for a little less than eight months at this point, so I decided to take a look at everything we had. I had spent a lot of time learning how to try some topic modeling with MALLET, thinking that it would be a substantial addition to my posts. Instead, I realized that no matter how I tweaked the model, the topics that I generated did not reveal any really great insights. At that point, I decided to focus on my other sections.

That being said, I still think this series of posts has its value, not only for the preliminary findings I was able to extract, but also as a building block for future analysis with the Our Marathon material. Our Marathon is continuing to grow and we are planning quite a few outreach days to gather more submissions to the archive. As we gather more material, I can re-apply these methods to check and see if my hypotheses remained. Moreover, as we incorporate more stories, I hope to re-visit some of the questions that my preliminary investigation revealed.

Also, the Our Marathon team is currently involved in a very substantial oral history project with people directly involved in and influenced by the marathon bombings (i.e. survivors, first responders, and local business-owners). And as these audio files are uploaded into the archive, we are making sure they are accompanied by full-text transcriptions. Once these transcriptions are in the archive, I could incorporate that text into my analysis and compare oral versus written narratives of experiences during the marathon bombings. Who knows? I might finally be able generate a meaningful and insightful topic model about the Our Marathon stories.

TABLE OF CONTENTS

  1. Share Your Story: Storytelling and the Boston Marathon Bombings
  2. Where are the Bombers?: What Can Word Clouds Tell Us?
  3. #BostonStrong
  4. “Fireworks or Cannons”: Phrase Nets of the Marathon Stories
  5. Conclusions and Future Research (this post)

Credit Transparency and the Collaborator’s Bill of Rights

I want to take this opportunity to expand a bit on what Abigail posted regarding collaboration on Digital Projects. In particular, I want to draw your attention to and work through the provisions of the “Collaborator’s Bill of Rights,” which is part of a larger report entitled “Off the Tracks: Laying New Lines for Digital Humanities Scholars.” This report was the result of a workshop on professionalization and ethics in digital humanities centers in January 20-21, 2010. According to the report, “This workshop addressed the rapidly emerging phenomenon of alternative academic careers among the hybrid scholar-programmers now staffing many DH centers,” particularly focusing the murkiness of potential professional development and career trajectories for those working on digital projects and utilizing digital methods. Hosted by Tanya Clement and Doug Reside, this workshop brought together many prominent digital humanists to discuss, write down, and possibly clear up some of this murkiness. One important section was on Collaboration, and the “Collaborator’s Bill of Rights” provides a set of guidelines for collaborating and crediting collaborators on digital projects. I think when discussing publishing, particularly collaborative digital publishing, this document brings up a few issues with collaboration and provides us with an excellent place to begin a discussion. And while this document might not represent a consensus of the entire digital humanities community (if there ever really is one), it does represent the collaborative work of many well known and widely read digital humanities practitioners. Moreover, by considering the four main sections of the “Collaborator’s Bill of Rights” we can explore not only the proper way to organize, document, and credit collaboration, but also look into what the major issues with collaboration seem to be.

The “Collaborator’s Bill of Right’s” is divided into four main sections. The first section states:

1) All kinds of work on a project are equally deserving of credit (though the amount of work and expression of credit may differ). And all collaborators should be empowered to take credit for their work.*

This statement not only reveals a call for fairness in crediting work on digital projects, but also calls your attention to the “all kinds of work on a project,” which reminds us that work on large, collaborative digital humanities projects often involve very different roles. It is very rare for a single individual (for the purposes of this scenario, let’s say an individual scholar) to be responsible for every part of a digital project. Often we require the help of web developers, programmers, librarians, and archivists. This entry, therefore, calls our attention to how a project manager is responsible for crediting each collaborator according to the magnitude and importance of their contributions to a project. In a more lopsided scale of collaboration, one might be able to get away with an “Acknowledgements” page—similar to what we often see in scholarly monographs. However, one needs to be sure to attribute credit to each individual working on the project according to the importance of his or her contribution.

This section also brings up the agency of each collaborator to take credit for his or her work, but it is explored in more detail in the next section:

2) The DH community should default to the most comprehensive model of attribution of credit: credit should take the form of a legible trail that articulates the nature, extent, and dates of the contribution. (Models in the sciences and the arts may be useful.)

Section two is divided into two parts. First, it concerns the DH community as a whole. It says it should emulate the models of credit utilized by the sciences and the arts as a useful starting point. I think makes sense, considering collaborative work appears to be more common in these fields than in the humanities so we might as well borrow from what works for them. The key point here is that maintain a trail of credit that is as detailed and accurate as possible so that everyone receives the credit that they deserve. But the second part of this entry (text not provided in this blog post), I believe is more revealing. It concerns the rights of collaborators in being credited and crediting themselves in “Descriptive Papers & Project Reports,” “Websites,” and on “CVs.” These three subsections demonstrate the right of each collaborator to demand credit appropriate to their contribution. Furthermore, this section does not define what makes one contribution more worthy of credit than another. Instead it leaves it up to the Collaborators to determine credit value. This is important because then each project can make an honest appraisal of what kind of credit different types of work deserves. At the same time, the “CVs” section gives the individual collaborator the right to “express their contributions honestly and comprehensively” on their Curriculum Vitae. Overall, these first two sections stress a need for a transparency regarding the collaboration process and workload while maintaining academic honesty and integrity from both the project as a whole and the individual collaborators.

The third section of the Bill of Rights concerns problems with credit and access to a project that has a particular institutional support:

3) Universities, museums, libraries, and archives are locations of creativity and innovation. Intellectual property policies should be equally applied to all employees regardless of employment status. Credit for collaborative work should be portable and legible. Collaborators should retain access to the work of the collaboration.

This brings up another issue with digital project. Since digital projects are often funded by and linked to a particular institution or group of institutions, issues of intellectual property often come up when a collaborator (by choice, or by necessity) moves from a participating institution on a project to another. Again this section stresses an ethical consideration for collaborators. Even if they leave the institution supporting a digital project, their contribution should still be recognized or credited and they should not be “locked out” of further developments in the project. I think this section is of vital importance for a collaborator, because if the digital humanists writing this report felt the need to address this, than it obviously has been a problem in the past. This entry follows a key theme of the rest of the Bill of Rights—everyone should be accountable and credited for his or her contribution to a project, regardless of his or her role (technical, research, metadata, etc…) and institutional affiliation. The key principle here is for each collaborator not only to receive the credit they deserve, but also to allow collaborators appropriate access to their own work.

The final section looks into an important part of any digital project, funding. Recognizing that those who fund digital projects often have a significant influence in the vision and purpose of a digital project, the Bill of Rights states:

4) Funders should take an aggressive stance on unfair institutional policies that undermine the principles of this bill of rights. Such policies may include inequities in intellectual property rights or the inability of certain classes of employees to serve as PIs.

It again stresses ethical consideration of and credit for individual’s contributions to a project, but looking to funders as a top-down way of making sure this happens. I understand this as a last chance approach. If individuals working on a project fail to accurately and ethically attribute credit to collaborators, and institutional policies also interfere with transparency in crediting contributions, then it is the role of the funders to step in and make sure it is done correctly. I think this might be a bit naïve, especially since the very institutions that are preventing this transparency of credit are funding digital projects. However, I do think it is important to recognize the vital oversight role funders play on digital projects and to encourage them to use that role ethically and responsibly. Overall, these principles encourage open and honest collaboration at the various levels of scale of a digital project. From individual collaborators to project managers, and institutions and funding sources, the “Collaborator’s Bill of Rights” tries to create an open and ethical environment where digital projects can flourish through effective collaboration—without each collaborator worrying that they will not receive credit for their work.

Now I know this did not directly have too much to do with digital publishing, but I feel these principles on collaboration are crucial to understand in order to begin conceiving a collaborative digital project. Furthermore, I think having a plan for collaboration and crediting contributions allows for more effective and creative digital products.

 

*All quotes are taken directly from the “Collaborator’s Bill of Rights”.

The Multimedia is the Message?

Some of you may be familiar with Marshall McLuhan’s “The Medium is the Message” argument. Others might not. This post does not focus on McLuhan’s argument, but it is informed by it so I just wanted to provide a bit of context. In his 1964 book, Understanding Media: The Extensions of Man, McLuhan argued that the medium, not the content, of a story or narrative (defined very broadly) is far more important in shaping what type of story a person tells.* The particular benefits and drawbacks influence your message or final product much more than the content of your story. For example, a movie, because of its ordered sequencing of events distorts story into a series of linear connections and order. Therefore, it is the abilities and restrictions of the medium that shapes the narrative, or message, of you movie far more than the content of the movie itself.

But when talking about multimedia projects such as Pine Point, Snow Fall, and Invisible Australians I cannot help but wonder what this does to the McLuhan’s idea of “message.” If the Multi-Media is the message then what does this mean? What do we look at to see this message? Do we look at the story as a whole? We could break down each individual “medium” in the multimedia story and search for individual messages. Maybe by finding the messages of each part of the story, we can gain a better understanding of the whole. But then do we not run the risk of missing out on the big picture of the story? We could also list out the multiple types of media used and see which one predominates the presentation, but then I think we sometimes miss out of the multi- part of the multimedia. What separates these multi- and new media stories from a monograph or movie is the various mediums utilized to tell the story. Consequently, although the various forms of media are vital in shaping the message of a new media story, I think the content in these stories are equally important as the forms of media in shaping the message. In order to understand the overall messages of Pine Point, Snow Fall, or the Invisible Australians one has to consider all these questions with an equal attention to the content. These multimedia stories are profoundly shaped by the individual types of media in them and the overall effect of these different forms of media. But they are also shaped by the connection that brings all these forms of media together, the content. Because there are so many different mediums at work, we must consult the content in order to better understand the message of multimedia.

Let’s take a look at Pine Point and attempt to uncover the “message.” Pine Point, is an exercise in memory, particularly the fleeting memory of a short-lived mining town. It focuses on images, short videos clips, audio clips, and short pieces of text to describe Pine Point. But it is not trying to form a cohesive story. It is a selection of experiences and memories of Pine Point. It does not follow a set chronological order, but instead jumps between the past and the present. The knowledge that this town does not exist any more is always present, even in the recollections. With the content in mind, we can better understand the organization and orientation of the new media story. Everything returns the eventual closing of Pine Point. Therefore, these clips, snapshots, digitized artifacts, and short text or audio blurbs, provide us with a scrapbooking effect. This presentation is a collection and display of memory much after events occurred, but it is also an account of the present. Like our own memories, this story serves as a necessarily imperfect recall of fleeting experiences. We cannot remember everything exactly as it occurred and our memories are constantly shaped by current knowledge and experience. And this multimedia presentation embraces that imperfect structure of memories, simultaneously highlighting important experiences and related them to the future fate of Pine Point and the current experiences of former Pine Point residents. And I think this is the message of the story, informed by both the elements (or mediums) that comprise it and the content of the narrative itself.

I will end this post with that look into Pine Point, but I think the same can be done for the Invisible Australians and the Snow Fall multimedia stories. Hopefully we can discuss this post in relation to them in class. I also wanted to direct you all to two similar “new media stories” that I have recently come across. The first is a Guardian piece, very similar to The New York Times’ Snow Fall story, on the NSA wiretapping scandal. The other is a Scalar project and a “born-digital multimodal article incorporating film, video, and audio clips that are integrated in, and central to, the argument” (Text taken from site) by Erin B. Mee, entitled “Hearing the Music of the Hemispheres.” Both are beautifully put together pieces that bring up many of the same issues brought up by our assigned projects/stories and the concepts I have brought up in this blog post.

 

*I am not saying that McLuhan’s argument is right or even generally accepted. I bring it up because I think it causes us to consider the effects of using different forms of media and makes us rethink how arguments, stories, and “messages” are constructed.

Reflections of an Omeka User

I just want to start by saying that Omeka is a great installation. It is useful, relatively user-friendly, and allows humanities student to create and style robust digital collections and exhibits with relatively little coding experience necessary. However, in my experience on the Our Marathon project, I have come to understand a few limitations(or perhaps cautions) of Omeka:

  1. Customization: Omeka is a great installation because it does a finite range of tasks very well. You can build exhibits, catalog items, add metadata, feature items, and even play around with some geographic referencing and timelining (via Neatline). But like the often-used idiom, Omeka is easy to use but very difficult to master. Omeka is organized so that it can fit the largest variety of projects. But as each digital project is unique, the needs of those projects will be different from others. This is where customization comes in. It requires a higher level of coding experience to make any major overhauls to the Omeka site. I see this as both a strength and a weakness. It is a strength because (depending on your coding experience or budget to hire someone with coding experience) you can usually find change your site so that it serves your exact purpose. But it also represents a weakness because one of the best features about Omeka is its active and productive forums and support staff. The more you customize your Omeka site, the less helpful these online forums and support staff will be.  The customization is great, but I just want to caution going overboard on the customizations.
  2. Is Omeka really for you?: I’ve run into this problem a few times already talking to people who were considering using Omeka only because they felt a lot of others were using it successfully. Before jumping into a platform, make sure you understand the scope, content, and purpose of your projects. I’ve seen many projects stall or need to start over because Omeka was not the proper platform for their project. When considering a digital project, you need to find the platform that exactly fits your project’s needs. Before jumping right into Omeka, consider other online platforms such as Scalar, WordPress, Drupal, or maybe even a combination of a few platforms. Each have their own strengths and weaknesses, but it is important to research them thoroughly before committing to a specific platform.
  3. Plugins are great, 100 plugins might not be!: One of the great aspects to Omeka is the number of incredible “plugins” that allow further customization and functionality for your Omeka site. But as I was saying in the first section, be careful. These plugins are great, but these plugins often need some tweaking to get it to function properly, or more efficiently, with your Omeka installation and your collection. Each time you tweak the code to get one plugin to fit, you might be changing something that is a vital part of the functionality of a different plugin. So, in my experience, the more plugins you have, the more likely subsequent plugins might not work. To avoid this, be smart about what plugins you install and which ones you use. Prioritize tasks or functions you absolutely want your Omeka installation to perform and try to stick to doing those tasks or functions really well.

 

Video Visualizations

Hey all. I just wanted to post a quick link to a really cool data visualization group’s site. They are called 422 South and do a lot of cinematic data visualizations (similar to my post a few weeks ago about data visualizations). A repository of their data visualization work is available here: http://422.com/work/tag/data-visualisation.

Here is a montage of some of their data visualizations as well:

I think this cinematic quality of data visualizations is something that seems to be underrepresented in the readings for this week. Much of Tufte’s work deals with creating and displaying static visualizations (for example, within the text of a monograph).   Jessop mentions that data visualizations require a a “visual literacy” that is related to how we look at:

  • “Galleries of images.
  • Museums and collections of objects,
  • Film, Television, and other moving images
  • Dramatic re-creations
  • Maps and atlases
  • Pictures of data…
  • …Single Images” (Jessop 286)

But Jessop’s focus is certainly focused on how we look at data visualizations as still images. His article purposefully glosses over the part on “film, television, and other moving images.” Theibault briefly mentions “cinematic mapping,” and Drucker does not talk about video, animation, or cinematic display at all in his piece.

Since I have become introduced to digital humanities and data visualizations (so the past year or so), I’ve noticed an increasing number of video visualizations, particularly on humanities projects. Videos are easier to create these days and videos allow us in the humanities to demonstrate change over time in a way that two-dimensional visualizations do not. Sure static visualizations have been effective at finding ways to incorporate a temporal component, but video and animation gives us the opportunity to manipulate an image over a set period of time (scaled and correlated to an actual interval of time). Check out this recent post on the Infectious Texts Project here at Northeastern: (http://www.wired.com/wiredscience/2013/11/data-mining-viral-texts-1800s/)

Most of the visualizations included in this article are video visualizations. Here is an example:

Even our professor, Ben Schmidt, utilizes video visualizations in his work on the whaling logs (http://sappingattention.blogspot.com/2012/11/reading-digital-sources-case-study-in.html)

Or take a look at many of the videos at the Spatial History Project at Stanford such as Railroaded.

I mention all these video visualizations just to point out that I think this seems to be an under-theorized part of looking at, analyzing, and creating visualizations. It seems to me that although video visualizations are not a new phenomena, it has certainly become easier to make video visualizations and to make them available (youtube and other video hosting sites). I definitely do not have time to do this in a blog post, but I do think it is important that we theorize and provide critique for these video visualizations as something similar but distinct from static visualizations and graphs. Hopefully we can discuss this more tonight!

A Reflection on Data Visualizations

This week I was catching up on recent blog posts in my RSS feed and I stumbled upon a very interesting (and very pretty) visualization of Baltic Sea Traffic (thanks to James Cheshire at http://spatial.ly/ for posting this video!). This visualization got me thinking about the benefits of using visualizations to represent big data, how visualizations can be argumentative, and the consequences of these realizations on how we critically examine such visualizations. But first, let’s take a look at the video:

So what does this visualization show us about the potential of using visualizations to represent data? First, it shows us how big data visualizations help viewers understand the scale and quantity of our data. It reminds me of the often-used Stalin quote, “A single death is a tragedy; a million deaths is a statistic.” In the Baltic Sea Traffic visualization, sure they could just say how many ships travel in and around the Baltic Sea on any given day. But that is just a number, and a really large one at that. If they just said there were–to make up a number–100,000 ships per day, would we really be able to understand how many ships there were? Would we get a sense of their routes? I think the answer is no. When I see a number like 100,000, I understand that it is a large number but I cannot really picture it. This visualization is so much more effective at getting the viewer to understand the sheer number of ships moving in and out of the Baltic Sea on a given day. By depicting each ship as its own node we get to see their movement and interaction in real time. It is a truly chaotic picture, which is made even more effective by showing the number of accidents, collisions, groundings, and illegal spills in the middle of the traffic visualization. These sorts of big data visualizations give us a way to demonstrate large-scale data in a more effective way than just numbers or prose.

This brings up another aspect of visualizations that I think is very important. Visualizations are not just evidence that supports a given argument. They are not just data or information.  Visualizations can be argumentative. Sure you might want to add some prose to explain or flesh out the argument, but this three-minute video mostly lets the moving image speak for itself. And I think it is more powerful because of that.

Finally, if visualizations can be argumentative then we, as critical humanists, must evaluate these visualizations as arguments–meaning we need to consider intentionality and purpose when analyzing the strength and value of these visual arguments. Much like a photograph or a work of art, these visualizations have a purpose and a message that affects the way they are constructed or displayed. This intentionality must be critically examined when evaluating these visualizations.

Zooming In and Out: Close Reading and Distant Reading

After reading through Jocker’s (Macroanalysis 2013) introductory chapters, one concept really jumped out to me that I found very valuable to understanding the role of Macroanalysis and “distant reading” in the humanities. One usually hears about the close reading/distant reading debate as something that is mutually exclusive—where a humanist either employs close reading or distant reading when analyzing sources. Close readings suffer from what Jocker’s refers to as “anecdotal evidence” (8), where one hypothesizes overarching theories from a very limited sample. Distant readings, on the other hand, may be able to analyze more texts in different ways, but often result in a loss of the contextual information that a close reading can reveal.

Jockers, however, in his third chapter, “Tradition,” uses a word that I think is a very valuable way of thinking about the close/distant reading debate. He uses the word “zooming” (in and out), to describe how close and distant readings can be complementary. To “zoom” is a useful and interesting way to describe this phenomenon. It infers a spectrum of scale in text analysis, and in digital work in general. Instead of choosing to do a close reading OR a distant reading on a given corpus, one can zoom in and out along this spectrum of textual analysis. Zooming establishes a complementary rather than a combative interplay between close and distant reading.

And this zooming can be employed across different projects or within the same one. One can “zoom in” on a single work or small corpus of sources, employing a traditional close reading, or one can “zoom out” and perform a distant reading of a million books. But this does not mean that once one zooms in they cannot zoom out in the same project. The scholar analyzing a single work or small corpus of sources can still benefit from a distant reading of both those sources and the larger corpus of digitized works. They can perform a basic text analysis of word usage, word pairings, and structural components to inform their close reading. Moreover, one can zoom out even more and use a broader text analysis of related works during the period to confirm or support their broader claims based on close reading. For example, one might postulate broader societal, political, and religious trends of a certain place during a specific time period based on a close reading. A distant reading of the larger corpus of works from the same region around the same time can support or disprove these speculations. A distant reading, therefore, complements a close reading by acting as means to sidestep only using “anecdotal evidence.” A distant reading can also be supplemented by a close reading of text. A pure distant reading runs the risk of becoming too abstract or removed from the texts. A close reading of a sample of works from a larger text analysis can support the broader phenomenological and discursive trends that distant readings attempt to reveal. Consequently, zooming in and out along this spectrum of scale allows close readings to compliment distant readings and large-scale text analysis to support the claims of in depth study into a limited number of sources. Zooming in and out allows those working in the humanities—digital or traditional (for lack of a better word)—to make their arguments with a greater level of precision and efficacy.

 

Explaining Digital Tools and Methods in History Writing

According to Gibbs and Owens in “Hermeneutics of Data and Historical Writing” new digital methods of data collection, analysis, and display require a new “level of methodological transparency.” They advocate an open documentation and presentation of the process of working with data. This is not only to inform readers of how they reached their conclusions, but also to familiarize readers with the different ways one can use data for historical research and analysis. Gibbs and Owens state, “We need to teach each other how we are using and making sense of data.” And Gibbs and Owens use “data” in a much broader sense. To them, data is not synonymous with evidence. Digital historians and humanists work with data not only as a confirmatory exercise, but can also use digital tools and methods as a means of discovering and framing new research questions. The mere availability of certain data sets, and the tools for interpreting them, opens up exciting new options for historical inquiry.  Stephen Ramsay calls this the “hermeneutics of screwing around,”  which include using digital tools to formulate research questions and creative failures that steer your research or analysis in a particular direction.

Gibbs and Owens, however, call for an open and available documenting of the process of using and analyzing data, even these initial steps of discovery and creative failure.  I think this open documenting is important and useful. It not only allows your readers to understand how you collected, used, analyzed, and manipulated your data, but also serves as a ways for you to familiarize your audience (particularly your non-digital colleagues) with using these new tools and data sets.

But I cannot help but wonder what this transparency will look like.  Let us assume that someone is trying to publish a traditional monograph while being as transparent as Gibbs and Owens are suggesting in their piece. Will it be in the form of an exhaustive and detailed introduction? That might discourage readers from looking at the rest of your work. What about if it were included at the end of the monograph in the form of an Appendix (or appendices)? That might discourage readers from even reading the section. I know from personal experience that Appendices are often skimmed over, if not ignored entirely. What about blogging about the process of researching and writing your monograph? This would allow you to avoid the first two problems, but by separating it from the monograph you risk having your reader’s not be aware of or have access to your blog. It would have to be explicitly stated in the monograph, and, even then, you cannot guarantee your readers will check out your site. The most effective way of integrating this transparency into your text might be to present your monograph in a digital format, such as Gibbs and Owens’ chapter, layering your methodology and process through a series of visualizations, hyperlinks, and other pages. But even that has its drawbacks. In academia, where peer-review and publishing still play such a significant role in hiring and tenure decision, can someone other than a tenured professor risk presenting their entire work online? Even then, would they?

Now I do not have any “answers” to this issue, but I think it is useful for anyone considering doing digital work to think clearly about how you are going to represent your research and analysis in the most effective way. Maybe an exhaustive introduction of digital work could work out best. Maybe it is best decided on a per-project basis. Or maybe one might consider a combination of these strategies (i.e. a presenting both a digital and print format, or including a digital companion to a hard copy work). If you are trying for the kind of transparency that Gibbs and Owens are suggesting, these are issues you must confront.

Rectifying Maps for the NYPL

For this week’s “making things digital” class, I decided to do something a little different than digitizing text. When I saw Ben’s post of suggestions, I was immediately drawn to the last option: Rectify Maps for the New York Public Libraries. I had done a bit of basic GIS before and was interested that they had an in-site rectifying tool rather than requiring complex and expensive GIS software.

I went to the site, watched their video tutorial (not the best quality video, but it told me exactly what I needed to know), and decided to start giving it a try. Rectifying a map involves three main aspects: the historical map, the base map, and your control points. In order to rectify a map, the user places control points on similar locations on both the historical map and the base map. These control points are paired to each other. By carefully placing enough of these control points, the user can manipulate the historical map to match up with the modern base map.

The next step was to choose what kind of maps I wanted to rectify. I wanted to choose a place and scale I was familiar with so I started searching for historical maps of my home state, New Jersey. I found two maps that I found very interesting and began working on rectifying them. One is a 1795 engraving of New Jersey by Joseph Scott of The United States Gazetteer  (Philadelphia) and the other is a 1873 map of New Jersey from the Atlas of Monmouth co., New Jersey. Here are images of the historical maps before rectifying them:

NJ Map 1795

NJ Map 1795 (before)

NJ Map 1873 (some control points shown)

NJ Map 1873 (before)

I have decided to include some of my control points into the 1873 map so that you can see what they look like. In order to properly rectify a map, you must have more than one control point. The NYPL site requires that you have at least three control points in order to rectify the historical map with the base map. Also, the warper includes a mechanism that determines how off (margin of error) each of your control points are between your historical map and your base map. The tutorial video instructs you to make sure each of your control points have a margin of error of less than 10. Going into this, I assumed that  more control points linking my historical map to my base map would result in a the more accurate rectified map. However, this is only if you can get your control points under that margin of error of 10. Also, adding more control points can often distort the margin of error for your other control points. So it is not always best to have the greatest number of control points, but instead one should place control points in optimal positions yielding the least margin of error. Each map is also unique, so you need to find out what you think the best arrangement and number of control points are. I am not saying that my rectified maps are perfect (they are far from it), but I found that around six control points did the trick.

After placing these control points, I cropped the historical map a bit so that it would fit better on the base map, then I clicked “Warp Image!,” then played around with the transparency settings of the historical map in order to produce these new rectified maps:

NJ Map 1795 (after)

NJ Map 1795 (after)

NJ Map 1873 (after)

NJ Map 1873 (after)

Now I will offer a few final thoughts about the process (although I definitely expect to do this again). First, rectifying maps is a frustratingly precise process. Borders, state lines, and towns on the base map are often in very different locations (or non-existent) on the historical map. Also when you are placing control points you have to be constantly aware of not only whether or not your control points line up to the correct location on the historical map and the base map, but also of how each control point affects the margin of error of each other control point you placed. For example, I tried rectifying a map of the United States and was able to place three control points with very little margin of error for each. I placed them at the Northwestern part of Washington, the Southwestern corner of California, and the Southern-most point in Texas. However, no matter where I placed the next control point, the margin of error seemed to skyrocket for all four points as soon as I placed the fourth one. This might have been a problem with the first three points, but it did prompt me to scale down my efforts from maps of the entire United States to New Jersey maps.

There is one last thing that I wanted to comment on, and it deals with the base map. I was thinking about how the entire process of rectifying these maps concerned warping the historical map to fit the base map. This one-way process assumes that the base map is the accuracy standard and all other maps must conform to its scale and borders. I think that this assumption is something that is taken for granted. I understand the need to have a standard map, but could it not also be useful to have the program do the reverse? What if it generated  an overlay of the historical map on the base map AND an overlay of the base map on the historical map? What kind of value would an arrangement like that have? I am not sure, but I think it is something that at least needs to be considered. Also there are many historical maps that contain different information than the base map and are, therefore, incompatible with the rectifying process (although they are still listed on the site). I just wonder that if by placing such confidence in the base map, we are losing important information from the historical map.  I’ll finish this post by showing one of those maps that are listed on the NYPL site but could not possibly be rectified to our modern base map. There are many of them, but this one in particular stuck out as a very valuable and informative map that is completely incongruous with the base map.

1671 Depiction of Floridans

1671 Depiction of Floridans