Chapter 4 Demonstration Files

We’ll be working with some files in the workshop.

Choose ones that interest you: I’ve put up a list on Dropbox here

Try to download 2 or 3 of these before the workshop or during the first few minutes of it.

These are smaller than the full vocabulary–about 50,000 words–to make them useful for downloading.

  1. “The_Donald.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “The_Donald”
  2. “books.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “books”
  3. “politics.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “politics”
  4. “streets.bin”: Something weirder: street names, not actual language.
  5. “glove.bin”: The default vectors trained with the “GloVe” algorithm.
  6. “teaching.bin”: Vectors trained on 15 million teaching evaluations from RateMyProfessors.com.
  7. A bunch of files with years: indiviudal chunks of the Hansard Corpus of British parliamentary debates.