Chapter 4 Demonstration Files
We’ll be working with some files in the workshop.
Choose ones that interest you: I’ve put up a list on Dropbox here
Try to download 2 or 3 of these before the workshop or during the first few minutes of it.
These are smaller than the full vocabulary–about 50,000 words–to make them useful for downloading.
- “The_Donald.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “The_Donald”
- “books.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “books”
- “politics.bin” 150-dimensional embeddings trained using word2vec on the 2016 posts to the Reddit subreddit “politics”
- “streets.bin”: Something weirder: street names, not actual language.
- “glove.bin”: The default vectors trained with the “GloVe” algorithm.
- “teaching.bin”: Vectors trained on 15 million teaching evaluations from RateMyProfessors.com.
- A bunch of files with years: indiviudal chunks of the Hansard Corpus of British parliamentary debates.