Notes: mostly we’ll be reading articles in this course available online. One are required for purchase. If you have difficulty obtaining any texts, please let me know as soon as possible. In week 1, you’ll read my spiel about what humanists need to understand when they read CS. My answer is, in general–you need to know what they did, but not how they did it. I’ve put some CS papers in this syllabus to expand your thinking about what’s possible. You should absolutely, positively, not aim to understand the process in a CS paper. As a rule, if you see a fancy equation in an article not written by a humanist, you can probably skip the whole section for the time being.
Introductions
Due Mon, Jan 24: - “Install R on your computer.” - “The programs R and Rstudio Rstudio is a wrapper program around the R language that we’ll be using for almost every assignment.”
What is (could be) Humanities Data Analysis?
Readings
Online text
Due Mon, Jan 31: Choose two datasets to discuss in class that are relevant to your research interests, so far as you’re able to find them.
One should be something that you can actually download, almost certainly in the form of a CSV or Excel File.
The other should be something that you know exists, but that you might not be fully able to work with yet.
For both of them, fill out the online spreadsheet. The goal here is to reduce this to a tabular dataset. Describe what each of the columns in this dataset would be.
Do not describe the dataset as a whole aside from the columns–see if you can capture it in the individual elements.
Due Mon, Jan 31: Try to finish the exercises for “Working in a Programming Language,” installing R and Rstudio.
Information %>% Data
Readings
Online text
agenda: Class agenda
Data Visualization
Readings
Online text
Practicum for next class
ggplot2
”Related texts not to read
Due Wed, Feb 16: Counting things
No class: President’s Day
Counting, grouping, and accounting for how only things that get counted count.
description: A huge amount of work is just about finding interesting things to count. Often, sophisticated work can just be figuring how to count something new. Here we look a little bit at how you can, simply count something.
Readings
Online text
Related texts not to read
Practicum for next class
Making data work together
Readings
Online text
Practicum for next class
No class: Spring Break
Text as Data, 1
Readings
practicum for next class: -“Texts as Data, exercises.”
Due Fri, Mar 25: Place on the course Slack two ggplot visualizations results from a join between two different datasets. Try to be goofy on one and serious with the others. You may use text fields if you want.
Text as Data, 2
Readings
Online text for this class session
agenda: Class agenda
Due Fri, Apr 01: Place on the course Slack two ggplot visualizations results from a join between two different datasets. Try to be goofy on one and serious with the others. You may use text fields if you want.
Space as Data
Readings
Online text for this class
Due Wed, Apr 06: Free exercise: use some bag of words on the texts of your own choosing and explore comparisons between subsets using PMI or Dunning. These can be full-text, XML, or–if–you prefer–wordcounts for books from the HathiTrust as described in the online text. Post as images or tables to the slack channel #getting-text-files."
Dogs as Data
description: I think we need a little reboot, so we’ll focus on dogs for a little bit. Claim a possible question in the slack as described there. It’s OK if you can’t fully realize what you want to do, but you must try something, post your questions, your broken code.
Readings
Due Mon, Apr 11: Download a shapefile or geojson from the Internet, read it into R, and make a map that you are confident no one has made before. Post in Slack.
Due Mon, Apr 11: Identify data/datasets you’ll be working with for the rest of the class
Supervised Learning and Predictive Models
note: From this point on, the weekly readings and topics are about specific applications of algorithms to different types of problems. To this point, everything we’ve done has been foundational–from here on out, it’s more about specific applications that you can do if you want, but don’t necessarily need to.
Class agenda
Readings
online text: Classification.
Clustering, topic modeling, and unsupervised approaches
Readings
In class agenda
Due Mon, Apr 25: due
Due Mon, Apr 25: text
The Embedding Strategy and representation learning.
description: Modern machine learning requires data, but it doesn’t just look like an XML or TEI representation. Instead, a particular trick for turning items into strings of numbers–the embedding strategy–has emerged as the dominant ways for computers to represent information to themselves.
Readings
Assignment for this class
Online text
Going deep
Readings
Class agenda
Allison, Sarah, Ryan Heuser, Matthew L. Jockers, Franco Moretti, and Michael Witmore. “Quantitative Formalism: An Experiment (Stanford Literary Lab, Pamphlet 1).” Stanford: Standford Literary Lab, January 15, 2011.
Blevins, C. “Space, Nation, and the Triumph of Region: A View of the World from Houston.” Journal of American History 101, no. 1 (2014): 122–47. https://doi.org/10.1093/jahist/jau184.
Daston, Lorraine, and Peter Galison. Objectivity. New York; Cambridge, Mass.: Zone Books ; Distributed by the MIT Press, 2007.
Drucker, Johanna. “Humanities Approaches to Graphical Display” 5, no. 1 (2011). http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. “Deep Learning.” Nature 521, no. 7553 (May 2015): 436–44. https://doi.org/10.1038/nature14539.
Logan, Trevon D., and John M. Parman. “The National Rise in Residential Segregation.” The Journal of Economic History 77, no. 1 (March 2017): 127–70. https://doi.org/10.1017/S0022050717000079.
Michel, Jean-Baptiste, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K Gray, Joseph P Pickett, Dale Hoiberg, et al. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science (New York, N.Y.) 331, no. 6014 (January 14, 2011): 176–82. https://doi.org/10.1126/science.1199644.
Rosenberg, Daniel. “Data Before the Fact.” In Raw Data Is an Oxymoron, edited by Lisa Gitelman. Cambridge: MIT Press, 2013.
Tukey, John W. Exploratory Data Analysis. Addison-Wesley Series in Behavioral Science. Reading, Mass: Addison-Wesley Pub. Co, 1977.
Underwood, Ted, David Bamman, and Sabrina Lee. “The Transformation of Gender in English-Language Fiction.” Journal of Cultural Analytics, 2018. https://doi.org/10.22148/16.019.
Unsworth, John. “Knowledge Representation in Humanities Computing,” 2001. http://www.people.virginia.edu/~jmu2m/KR/KRinHC.html.
Witmore, Michael. “Text: A Massively Addressable Object,” December 31, 2010. http://winedarksea.org/?p=926.