I’m updating this from the 2015 syllabus–but things have changed in this field, a lot, and so will the syllabus before we start. I’ve also stolen a lot from Ryan Cordell’s 2017 offering of this course
Notes: mostly we’ll be reading articles in this course available online. A few books are required for purchase. If you have difficulty obtaining any texts, please let me know as soon as possible.
In week 1, you’ll read my spiel about what humanists need to understand when they read CS. My answer is, in general–you need to know what they did, but not how they did it. I’ve put some CS papers in this syllabus to expand your thinking about what’s possible. You should absolutely, positively, not aim to understand the process. As a rule, if you see a fancy equation in an article not written by a humanist, you can probably skip the whole section for the time being.
Thursday, January 10
Problem set: Regex practice. Note: Regular expressions embody pretty much everything that is miserable, ugly, and inelegant about computer programming. But they’re basically indispensable for actually manipulating data in the real world. So we baptize by fire!
Thursday, January 17
read_table
and tidyr
tidyr
to manipulate a data set into a form for network representations.%>%
apply %>%
combineThursday, January 24
A huge amount of work is just: finding interesting things to count. Often, sophisticated work can just be figuring how to count something new. Here we look a little bit at how you can, simply count something.
Problem Set: Split/Apply/Combine
Do a bit of a research to try to find some tabular data that you can bring to class about something you’re interested in.
Good data for these purposes.
Exception/special case: is there textual data you can work with?
%>%
visualizationThursday, January 31
Problem set: Visualization
ggplot2
Some documentation is available at Wickham “Ggplot2.”
Penumbral:
Reading postponed–keep working on the visualization task with the HathiTrust books.
%>%
dataThursday, February 14.
%>%
the Embedding Strategy.Thursday, February 21
Modern machine learning requires data, but it doesn’t just look like an XML or TEI representation. Instead, a particular trick for turning items into strings of numbers–the embedding strategy–has emerged as the dominant ways for computers to represent information to themselves. So we’ll talk about that strategy, and how to get things in and out of it.
Supplemental readings (technical explanations):
Thursday, February 28
Thursday, March 14
Problem sets:
Methods:
Thursday, March 21
Problem sets and methods:
Thursday, March 28
Thursday, April 4
Wednesday, April 11
Allison, Sarah, Ryan Heuser, Matthew L. Jockers, Franco Moretti, and Michael Witmore. Quantitative Formalism: An Experiment (Stanford Literary Lab, Pamphlet 1). Stanford: Standford Literary Lab, n.d.
Behrens, John T. “Principles and Procedures of Exploratory Data Analysis.” Psychological Methods 2, no. 2 (1997): 131. http://psycnet.apa.org/journals/met/2/2/131/.
Blevins, C. “Space, Nation, and the Triumph of Region: A View of the World from Houston.” Journal of American History 101, no. 1 (2014): 122–147. doi:10.1093/jahist/jau184.
Daston, Lorraine, and Peter Galison. Objectivity. New York: Zone Books ; Distributed by the MIT Press, 2007.
Gitelman, Lisa. "Raw Data" Is an Oxymoron. Infrastructures Series. Cambridge, Massachusetts: The MIT Press, 2013.
Goldstone, Andrew, and Ted Underwood. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45, no. 3 (2014): 359–384. doi:10.1353/nlh.2014.0025.
James, Gareth. An Introduction to Statistical Learning with Applications in R, 2013. http://dx.doi.org/10.1007/978-1-4614-7138-7.
Jockers, Matt. Text Analysis with R for Students of Literature. Springer, 2014. http://www.springer.com/statistics/computational+statistics/book/978-3-319-03163-7.
Klein, Lauren F. “The Image of Absence: Archival Silence, Data Visualization, and James Hemings.” American Literature 85, no. 4: 661–688. Accessed January 14, 2015. doi:10.1215/00029831-2367310.
Mosteller, Frederick, and David L. Wallace. “Inference in an Authorship Problem: A Comparative Study of Discrimination Methods Applied to the Authorship of the Disputed Federalist Papers.” Journal of the American Statistical Association 58, no. 302 (1963): 275–309. http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500849.
Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2011.
Rhody, Lisa M. “Topic Modeling and Figurative Language.” Journal of Digital Humanities 2, no. 1 (April 7, 2013). http://journalofdigitalhumanities.org/2-1/topic-modeling-and-figurative-language-by-lisa-m-rhody/.
Schmidt, Benjamin. “Stable Random Projection: Lightweight, General-Purpose Dimensionality Reduction for Digitized Libraries.” Journal of Cultural Analytics (2018). doi:10.22148/16.025.
Tukey, John W. Exploratory Data Analysis. Addison-Wesley Series in Behavioral Science. Reading, Mass: Addison-Wesley Pub. Co, 1977.
Underwood, Ted, David Bamman, and Sabrina Lee. “The Transformation of Gender in English-Language Fiction.” Journal of Cultural Analytics (2018). doi:10.22148/16.019.
Wickham, Hadley. “Ggplot2.” Wiley Interdisciplinary Reviews: Computational Statistics 3, no. 2 (2011): 180–185. doi:10.1002/wics.147.
Wilkens, Matthew. “The Geographic Imagination of Civil War-Era American Fiction.” American Literary History 25, no. 4: 803–840. Accessed January 15, 2015. doi:10.1093/alh/ajt045.
Witmore, Michael. “Text: A Massively Addressable Object,” December 31, 2010. http://winedarksea.org/?p=926.
The algorithms we will discuss in the second half of the semester are discussed in greater length in this text. If you wish to come to more mathematical understanding, this provides a relatively gentle introduction in machine-learning terms, but with some levels of math we’ll gloss over in this class, also based in the R language. All chapters are available for download, for free, from the Northeastern library; download any now that you find helpful.↩
For those interested solely in text analysis and not census, bibliographic, or other forms of “humanities data,” this may be valuable. But be aware it uses a different set of libraries and data models for visualization and analysis than the ones we are using in this class, so the code is unlikely to work immediately↩