This is a clone I have made of a page made by David Mimno to show themes in the topic model of Matt Jockers' work Macroanalysis. (You can also read Matt's posts here.) Mimno recently put this online, but I've adopted it to display the topics in two dimensions rather than one. Instead of just showing decreasing lines, I show from left to right topics that decrease, and from top to bottom topics that bump up (or down) in the middle but stay straight in the sides. This is determined in a straight pearson correlation against the numbers 1:20--no PCA involved. The reason I've been using PCA, as I'll say later in more depth, is that I hope that it can reveal these arcs that run top-to-bottom (which roughly corresponds to the second PC on my PCA plots). But they're already there in the topics. (This could be an artifact of topic modeling, and some of it certainly is; certain topics are probably less likely to jump chunk boundaries, those will be start-end topics. But I doubt that's the whole effect.)
I see fewer of these in Jockers' topics than in the ones I wrote about earlier. Since Jockers' model is better (more carefully composed, inspected by hand, etc.), this may indicate that my model is particularly likely to pull out middle-y topics; it might also mean that TV shows and novels are different.

In italic, here's Mimno's introduction, with a bit chopped out.

This page shows the 500 topics used by Matt Jockers in his book Macroanalysis, which he extracted automatically from ~3000 mostly 19th century English-language novels. The algorithm divided the content of the novels into themes based on word co-occurrence patterns within 1000-word chunks.

Each 1000-word chunk is assigned to one of 20 equal-sized sections. (Long books will have 20 long sections, shorter books will have 20 shorter sections.) A topic that occurs more in early sections of novels will have a decreasing line, like school, while a topic that occurs in later sections will have a rising line, like punishment. The topics are sorted so that topics that occur early in novel time are at the top of the page, and topics that occur late are at the bottom. The gray shaded region represents one standard deviation on either side of the mean — there's a lot of variability between novels.

Unlike Mimno's version, you have to click to see the labels for the mini plots since I'm embedding them in a two dimensional space. You have to click on the plot to make it disappear, since that was easier for me to code.

I apologize for the lack of x and y axes on the overall thing--the pearson range for the x axis is -.98 to -.96, and for the y axis -.68 to .86. This was a train-ride project, and I'm pulling into South Station now.

Where do themes occur in novels?

Once again, click to expand the lines and labels