3-Word Embeddings

Download PDF

Our simple program to begin.

After starting up your VM and running the update code (which downloads the file you need here), cut and paste the following block.

library(wordVectors)
library(magrittr)

vectors = read.binary.vectors("/texts/medical_vectors.bin",nrows = 50000)
vectors %>% nearest_to(vectors[["tumor"]],n = 10)

OK.

A much more complicated program, for later or for speed demons.

opposition_1 = vectors[["he"]] - vectors[["she"]]
opposition_2 = vectors[["america"]] - vectors[["england"]]

field_words = vectors %>% nearest_to(
  vectors[[c("dermatology","pediatrics","ophthalmology","psychology","histology")]],100
  ) %>% names

smaller_vectors = vectors %>% filter_to_rownames(field_words)

similarity_to_1 = smaller_vectors %>% cosineSimilarity(opposition_1)
similarity_to_2 = smaller_vectors %>% cosineSimilarity(opposition_2)

# We plot in white so the circles don't leave a mark

plot(x=similarity_to_1,y=similarity_to_2,col='white')

# And then use 'text' to actually write on the screen.

text(x=similarity_to_1,y=similarity_to_2,label=rownames(similarity_to_2))