How are you getting along in Fundies?

(Or--looking back, how do you feel about Fundies?)

What's working in the class for you?

What's alienating, difficult, or frustrating?

Computer languages

Programming Language Family Trees

Programming Language Family Trees

Longer version

Why do you choose a language?

  • Conformity with your team
  • Pre-existing libraries
  • Ease of maintenance
  • Ease of development
  • Speed


Reading R Code

  1. Functions use names before parenthesis:
    instead of (log 5),
  2. Comments with the # sign
  3. Encourages literate programming.

Why R? It's widely used in humanities and social sciences

  • But, WHY?
  • One useful data structure --the data frame.
  • Through libraries, a useful syntax for data processing.
  • Other libraries to do things in statistics, visualization, etc.

Reading Racket code:

Functions evaluate from the inside out.

(* 3
  (- 1
    (/ 2
      (+ 1 7 )

Reading (tidy) R code:

The pipe operator (%>%) runs from left to right.

sum(1,7) %>% 
divide_by(2) %>%
subtract(1) %>%

Advice on re-using code.

  1. Copy and paste, a lot.
  2. Check in with your neighbors.
  3. Change one part of the code at a time
  4. If you don't break anything, you're not trying hard enough.
  5. Google the errors.

The Data Frame is the basic data type that keeps R around. It's like the data displayed in a spreadsheet.

data_frame(candidates = c("Clinton","Trump","Johnson"), polling = c(45,40,5), party = c("D","R","I") )

Only problem: text looks like this:

UNDER the shadow of Boston State House, turning its back on the house of John Hancock, the little passage called Hancock Avenue runs, or ran, from Beacon Street, skirting the State House grounds, to Mount Vernon Street, on the summit of Beacon Hill; and there, in the third house below Mount Vernon Place, February 16, 1838, a child was born, and christened later by his uncle, the minister of the First Church after the tenets of Boston Unitarianism, as Henry Brooks Adams.

Bag of words

Bag of words

Bags of words retain a lot of meaning

Bag of words in R;


Named entity extraction finds places or people in unstructured text.

Topic Modeling assumes that documents are composed of a few underlying topics,

A topic model of science articles

word2vec positions words so that relationships can be represented spatially.

A reduction of a complicated word2vec space