Facing Challenges in Digital Visualization

These four readings, while they demonstrate the usefulness and importance of visualizations in historians work, particularly using new digital methods, also layout a series of challenges and difficulties facing the field in regards to their use. Most of these challenges lie in the deep-seated methodologies that have characterized the field and the need to become proactive rather than reactive to changing and improving technologies. If we were to zoom out on the challenges of digital visualizations in the humanities we would find that they are closely tied to the challenges facing the humanities as a whole, as we had discussed previously in the semester. Within the area of digital visualizations we find that not only do we have to educate on how to use digitalization in a way that will be effective and clear to the audience, but also we must educate the audience on how to understand digital works.

The use of graphics to represent information in the humanities is not new, in fact both Jessop and Tufte discuss ways in which graphics have and can be used throughout the study of the humanities. However, because of new technologies these can become much more effective by “escaping the flatland” of traditional methodologies. Tufte also discusses how various approaches to visualization can affect the way that the audience perceives the information provided. This is an idea that is discussed in Theibault’s article as well, regarding the need for greater transparency to aid understanding throughout the study of history, and how digital visualizations can be a solution to this problem. He sees that this is an area where the “hermeneutics of data” would come into play. Rather than hiding ones research methods, Theibault shows how digital visualizations can help solve the problem of, “balancing honesty in visual rhetoric and clarity and persuasiveness.” This is seen in the inclusion of helpful “how to read” sections that oftentimes accompany graphics to make these visualizations more accessible and understandable to a broader audience. Once again we find that historians must adapt to the new technologies and make themselves more literate in the digital visualizations.

Jessop also touches upon this need to make visualizations more accessible through the improvement of visual literacy education within the humanities. Visual literacy education would involve several levels, such as teaching students to understand digital visualizations better as well as teaching those in the humanities how to create more understandable digital visualizations through a multi-disciplinary approach, involving professional artists. We see in the potential solutions, once again, a connection to the needs of the humanities on a broader scale, such as the need for better collaboration between disciplines, as well as improving digital education to gain a wider acceptance in the field. In Drucker’s article, she discusses the ways in which these digital visualization tools can be used most effectively, showing that new methods of looking at data can help to make these visualizations more easily understood and accepted

Overall, these articles (and the book) help to demonstrate that, as with previous arguments we have seen, there is a great need within the practice of history, and the humanities as a whole, to improve methods of education regarding how to effectively use digital methods. In a side

Producing Space

Spatial history, especially regarding projects done on the scale of Stanford’s ORBIS project, appears to benefit from developing technologies and digital tools in a more fundamental manner than traditional and text based research and analysis. “Zooming out” from texts for big picture ideas is clearly facilitated by many of the tools we discussed earlier in the class, but it is still an extension of the concept of texts as an authority and researchers can still theoretically read for the information they are looking for. In order to get at some of the key information of spatial history, there is a certain extent to which the data cannot be accessed until it is visualized. These static representations, in order to reflect change over time, also have to contend with different variables and dimension, which often can be portrayed through interactivity. Walter Scheidel in “The Shape of the Roman World”, describes some the capabilities of the ORBIS project through previous attempts to understand space and history without the technology means now available. In his assessment of Fernand Braudel’s maps and “the human struggle against distance”, Scheidel notes: “ [Braudel’s] pioneering efforts were narrowly circumscribed by the resources available at the time” (2). Scheidel references the importance of the actual computing power require to deal with the various types of data that would have to come together to form an understanding of human movement through space.

This idea of movement and space is also heavily emphasized in Richard White’s article “What is Spacial History?” Space, including its constructions and its representations, has fairly consistently been understood to affect human behavior, especially in the wake of Henri Lefebvre’s historical theories. This brought to mind an example from another class, where the associations of behavior and space were explicitly rendered in architectural history. Robert Weyeneth, in “The Architecture of Racial Segregation”, summarizes the “spatial strategies of white supremacy”(12) and specifically the current legacy behind the idea that “the architecture of racial segregation represented an effort to design places that shaped the behavior of individuals (13)”. Referring to physical constructions such as separate drinking fountains, restrooms, ticket booths, theater seats, and a host of other segregated spaces, Weyeneth also describes some of the human behaviors elicited by these spaces. One notable reaction was his note that “one measure of the success of the civil rights struggle was the dismantling of segregated space” (38). While this instance of architectural history does not match the scale or technical demands of many spatial history projects, it models the effect of movement, which in this case is clearly institutionally manipulated. White makes the point that “we produce and reproduce space through our movements and the movements of goods that we ship and information that we exchange” (3). How exactly that space is produced and reproduced is the insight that visualizing trends in movements can help form, but are not always as obvious as signs on a fountain. The scale of projects like ORBIS, in bringing together an enormous amount of data in a very general way, shows the cumulative effects of elements like winds and ocean currents which are otherwise invisible, unlike the physical features of a building.  

History on the Head of a Pin

Richard White, in his article “What is Spatial History?” discusses the common lack of spatial analysis in historian’s explanations of change over time. While there is not always a lack of attention to space, the preferential analysis of chronology elicits criticism from geographers claiming that historians write history as if it occurred “on the head of a pin”. Of course he proceeds to show examples where that is not the case. I was struck by this criticism that I had not previously considered. Clearly, space is a very important consideration that must be made in historical research. I am considering some interesting spatial research questions concerning South Africa in the 1920s. First, natives who represented 90+% of the population where restricted to 13% of the land. I question the state of the land allocated to natives. Many historians have claimed that it was substandard land that was inaccessible to primary population and economic centers. Second, I am curious about population density maps and how they altered as South Africa transformed into an Apartheid state. Finally, I am curious about transportation between these spaces, similar to the research done by Stanford University’s project on the Roman Empire.

As we begin to think about the implications of space on change over time, a couple points that white points out beg some discussion. Absolute space, the measurement of distance in terms of miles or meters, I would argue is less important to historians than relational space, which interprets space in terms of cost, time or other changing factors. I think of extreme examples. South African natives during Apartheid may live merely a few miles from family or work and yet because of regulations and danger, these locations are inaccessible. While these factors are challenging to represent visually in absolute space, the ORBIS Project, The Stanford Geospatial Network Model of the Roman World represents some of the possibilities of representing these factors. Cost analysis, both time and money, play a major factor in the connections made in this interactive map. Ironically to me, what does not play a major role in this spatial network is implications of time, specifically the change over time of these routes. In the nearly 500 years of the Roman Empire, transportation must have undergone some technological changes which alter route, cost and time, all of which are not represented as far as I could tell. It is as if the historians working on this project traded history on the head of a pin for history frozen in time.

Spatial History Creating New Avenues of Study

Space is an incredibly broad concept, and one Richard White states historians typically write as if it matters little. This comes as little surprise because of the many layers and factors that go into determining space. In doing this week’s readings and looking around the ORBIS website, we see that in looking at relational space through arcGIS, historians can open up entirely new avenues of thought and research that they had not been able to before. It also becomes abundantly clear that movement, especially in the ORBIS project, is a key ingredient in determining and visualizing spatial history, as per the argument in “What is Spatial History?”

We have spoke many times about textual history and other traditional modes as being confined to chronological accounts of the human experience. For me, this week’s readings shows the biggest break from the traditional. By layering together the different types of movement and factors that go into it, we are able to understand relational space in a way that historians had never been able to before. We often think of space in terms of distance in miles/kilometers/etc. but in looking at it in a different way, in relation to movement as with the ORBIS project, we see for instance the importance of time in determining how far away something appears to be. We also see that in determining distance and time we have to factor in mode of transportation, weather, and price. By taking all of these into account we can fully understand the meaning of space, in this case in ancient Rome.

While there is clearly a lot of research that goes into the production of these maps, White is also correct in his assessment that this is an open-ended project that actually produces more questions for historians rather than answers. By creating projects that take spatial history into account, we open up entire avenues as seen in the “Applying ORBIS” section of the website. For example, in reading ORBIS and the Ancient Itineraries: Preliminary Observations by Dan-el Padilla Peralta we see how, “For future researchers, ORBIS will be useful not only as a means for gaining purchase on the realities of mobility in the Roman Mediterranean, but as a tool for evaluating gradations in the nature and quality of the information furnished by the IA and other itineraries for road travel in the Empire’s various regions.” This is just one of the incredibly broad areas that has been and can be explored thanks to the work done on ORBIS. The opportunities provided to historians in mapping history are certainly much greater than what I had expected initially, and one can see why and how space had previously been overlooked in the more traditional and textual modes of studying history.

Experimenting with OCR

In my digital history class we recently discussed the OCR program, software that transcribes scanned images of texts. I was curious as to how to use this program, so I decided to give it a trial run.

I bought a copy of the Boston Globe one Friday, with the intention of scanning it and running it through OCR. I selected three articles to scan: two were local and one was national. I decided to focus on one article in particular for a sample transcription. It was about a strike between the union and top officials of Boston’s public transportation system. It might certainly be of interest one day to economic historians.

To scan the newspaper, I had to go to my school library. The interface was pretty simple. It allowed me to select the kind of file I wanted to save the scans as (I chose PDF format), and whether or not to scan them in color, greyscale, or black and white (I chose black and white at the advice of my professor). Conveniently, the library computer gave me the option of scanning the newspaper directly to an email attachment, so I was able to access the scans on my computer almost immediately.

One of the scanned pages

One of the scanned pages

The complete set of scans, in their designated folder

The complete set of scans, in their designated folder

After installing Tesseract, the OCR software, I had to move the scans into an isolated folder (labeled “pdfs”) and download a file that would make it possible to type in an appropriate command line and allow Tesseract to recognize the scans as an object to transcribe. I then typed in the command “cd” and dragged the folder from the Finder application (I use a Mac) into the command line, which made the complete command:

With this command entered, I could then command Tesseract to transcribe the scans. To do so, I typed in the command “sh” followed by the name of the “director” file and the name of the input file (the sh command is used to specify the input file):

All that was left to do was to hit “enter,” and Tesseract converted the pdf and transcribed it, with the following result:

The "images" and "texts" folders were created by Tesseract; the "Tesseract-1" file is the "director" file

The “images” and “texts” folders were created by Tesseract; the “Tesseract-1″ file is the “director” file

The sample article, transcribed

The sample article, transcribed

As the above picture shows, Tesseract did a fairly decent job transcribing the article accurately. It even reflected the newspaper’s text margin and was able to recognize separate articles, even though their print may have been horizontally aligned on the same page. The program did have difficulty in some areas though. Not surprisingly, the minor text at the top of the page, such as the letter identifying the section of the newspaper and the stock market indexes, were sloppily transcribed:

There were also some peculiar spelling errors:

"MBTA officials"

“MBTA officials”



A potentially more serious problem was that at one point, Tesseract “misread” an article and aligned the text of another article in the middle of my sample:

The Original

The Original

The (Incorrect) Transcription

The (Incorrect) Transcription

Nevertheless, I completed the transcription myself. Pretending that this would one day actually be used by historians, I rearranged the transcription into a more compact format, added the missing bits, and corrected spelling errors.

The Final Copy

The Final Copy

My results using OCR were mixed, but overall, it does expedite the process of transcription, and its errors can fairly easily be accounted for by a simple review. Frederick Gibbs and Trevor Owens have argued in their essay The Hermeneutics of Data and Historical Writing that descriptions of the methods of the digital humanities needs to be included in the historical literature, so that potential inaccuracies may be spotted. As far as this very limited example, OCR, is concerned, I am not convinced of the need for historians to explicate the fact that they used digitally transcribed sources and the process this method of transcription entails, so long as the transcriptions are diligently checked for accuracy. What might be needed more so in this case is simply for the “bugs” of the software to be corrected; since software is not something static (there are many versions and updates), it seems like OCR has the potential to develop into a very powerful tool.


Communicating Data: Visualizations

One of the strongest features recommending data visualization is the opportunity for effectively streamlining communication based on the immediacy and intuitive nature of our sense of sight. In the tradition of “a picture is worth a thousand words” is the acknowledgment that the information we absorb visually can be far more efficiently presented than purely with text or numbers. This is evident in mapping networks of actors, where the influence of each defined in physical space draws out an automatic reaction. Shin-Kap Han’s analysis of Paul Revere’s ride and mobilization not only revealed the unique placement of Revere and Warren in the network of revolutionary actors, as well as a visual scope of their influence, but also served to communicate the basis for a “critical corrective” regarding the incentives of brokerage.

In a fortuitously timely conversation I had with a close friend and graphic designer, I was directed to the one of the earliest influential incarnations of deliberately designed graphic representations of information. Charles Joseph Minard released this depiction of Napoleon’s invasion of Russia (1812) in 1861:

This notorious military failure is given a shape that is more directly communicative than numbers, and simultaneously tracks multiple variables (direction, temperature, location) in a single presentation. The representation, done without the aid of technology available today, still compresses a significant amount of information mapping the relative influence of multiple factors on Napoleon’s army. The visual representation of the army’s decimation has a more immediate effect and comprehensive scope than a straight recounting of losses and environmental conditions.

Clearly there are ways in which these representations are then subject to manipulations and generate different conclusions due to the variability of interpretation. Caroline Winterer mentions geographic illusions that come up in mapping the correspondence of both Franklin and Voltaire: “depending on how we interpret the maps, we can call Franklin either more peripheral than Voltaire to the republic of letters (since much of his activity emerged from the colonial periphery), or more worldly than Voltaire (since his network reached across the Atlantic)” (Winterer, 611).The near instantaneously communication of the map plots do not contribute to forming an interpretation in either direction. It is still up to the scholar to understand the nature and content of these letters, or to adopt a perspective from which to derive insight from the newly visualized data.

Thoughts on the effect of distance reading on research

I have been working on researching the collaboration of Black organizations in South Africa during the 1920’s. It is during these years that Marcus Garvey’s UNIA movement was gaining momentum in Africa, specifically South Africa and Liberia, The South African Communist Party was founded, The Industrial and Commercial workers Union (ICU) a sister organization to the IWW was founded, The African National Congress was founded and gaining momentum in South Africa, as well as religious rebellions. While this topic has not attracted a great deal of attention from historians interested in South Africa, there is a wealth of sources, many easily accessible following the Truth and Reconciliation Commission. The challenge has been sorting through all the useful sources. For example, I have more than 50 speeches, articles and documents from these organizations specifically from the 20s with many more that I am aware of. There is also an abundance of newspapers published by or circulated by these groups at this time. My project has been to understand the social and political climate  of the native community in South Africa during this interim period between South African Independence from Britain and the establishment of Apartheid.

Digital history has begun to open to me new ways of reading the many sources that I have been struggling with as well as new questions to ask concerning these sources. This week I have been tracking down these many sources by sifting through online archives and picking out nearly all sources from the 1920s in South Africa. The majority of these sources have been on, and Many of these I had seen before and yet passed by because I simply could not read it all. While searching this week I was not concerned about my ability to read each source and choosing only a selections that seemed specifically interesting to me, but was thinking that instead that by using text analysis to “read” these sources I would actually get a more complete representation of these organizations. While we have discussed concerns about Digital History methods further distancing us from history, making it a less personal practice of statistics, graphs and text analysis rather than telling a story, but this week i was realizing that these tools will allow me to do further research and tell a more complete story.

Visualizing Networks

It’s difficult to live in Boston and not have a preconceived notion of Paul Revere’s contributions to the American Revolution. Usually these notions are placed in two camps, those who love Longfellow’s poem and believe he was the sole rider and most important person during the night before Lexington and Concord or those who believe he is an overrated figure who doesn’t deserve the praise and notoriety brought to him through Longfellow’s work. Having worked as a state house tour guide for a year, I confronted many interpretations of Paul Revere’s ride through my daily interactions with the public. As a public historian, one is often put in this position, how do we navigate national memory, invented traditions and personal beliefs in the scope of conveying some sort of historical truth to our audience?

This networks of Paul Revere’s role during the Revolution to me helps to find a solution to this problem where words and other more “traditional” methods of history have thus far failed us. By placing data into these illustrations and charts, it becomes clear what Revere (and Warren’s) roles were in the Revolution. We find that the answer lies somewhere in between the two camps that are typically formed. As Dave states in his post, “these sorts of big data visualizations give us a way to demonstrate large-scale data in a more effective way than just numbers or prose.” We may not be talking about the large-scale data when it comes to Paul Revere, but the same idea applies. Visualizations are not only a way to accompany a scholarly work, but they can standalone as quite effectively as well. To me this does what Winterer discusses in Where is America in the Republic of Letters, “like a satellite hovering above the Earth, visualization can help us to see the big picture amid bewildering complexity and to detect new patterns over time and space.”

Winterer’s work is different than that of Shin-Kap Han who is working with physical data and membership information, while Winterer uses letters that are difficult to digitize and categorize. Winterer demonstrates how they can be used to show globalization trends and to examine data across national borders as well as within them.  From both articles we see that networks serve as a valuable way to examine the past from a new perspective.

A Reflection on Data Visualizations

This week I was catching up on recent blog posts in my RSS feed and I stumbled upon a very interesting (and very pretty) visualization of Baltic Sea Traffic (thanks to James Cheshire at for posting this video!). This visualization got me thinking about the benefits of using visualizations to represent big data, how visualizations can be argumentative, and the consequences of these realizations on how we critically examine such visualizations. But first, let’s take a look at the video:

So what does this visualization show us about the potential of using visualizations to represent data? First, it shows us how big data visualizations help viewers understand the scale and quantity of our data. It reminds me of the often-used Stalin quote, “A single death is a tragedy; a million deaths is a statistic.” In the Baltic Sea Traffic visualization, sure they could just say how many ships travel in and around the Baltic Sea on any given day. But that is just a number, and a really large one at that. If they just said there were–to make up a number–100,000 ships per day, would we really be able to understand how many ships there were? Would we get a sense of their routes? I think the answer is no. When I see a number like 100,000, I understand that it is a large number but I cannot really picture it. This visualization is so much more effective at getting the viewer to understand the sheer number of ships moving in and out of the Baltic Sea on a given day. By depicting each ship as its own node we get to see their movement and interaction in real time. It is a truly chaotic picture, which is made even more effective by showing the number of accidents, collisions, groundings, and illegal spills in the middle of the traffic visualization. These sorts of big data visualizations give us a way to demonstrate large-scale data in a more effective way than just numbers or prose.

This brings up another aspect of visualizations that I think is very important. Visualizations are not just evidence that supports a given argument. They are not just data or information.  Visualizations can be argumentative. Sure you might want to add some prose to explain or flesh out the argument, but this three-minute video mostly lets the moving image speak for itself. And I think it is more powerful because of that.

Finally, if visualizations can be argumentative then we, as critical humanists, must evaluate these visualizations as arguments–meaning we need to consider intentionality and purpose when analyzing the strength and value of these visual arguments. Much like a photograph or a work of art, these visualizations have a purpose and a message that affects the way they are constructed or displayed. This intentionality must be critically examined when evaluating these visualizations.