You are looking at content from Sapping Attention, which was my primary blog from 2010 to 2015; I am republishing all items from there on this page, but for the foreseeable future you should be able to read them in their original form at sappingattention.blogspot.com. For current posts, see here.

Plot arceology 2016: emotion and tension

Jul 18 2016

Some scientists came up with a list of the 6 core story types. On the surface, this is extremely similar to Matt Jockerss work from last year. Like Jockers, they use a method for disentangling plots that is based on sentiment analysis, justify it mostly with reference to Kurt Vonnegut, and choose a method for extracting ur-shapes that naturally but opaquely produces harmonic-shaped curves. (Jockers using the Fourier transform, and the authors here use SVD.) I started writing up some thoughts on this two weeks ago, stopped, and then got a media inquiry about the paper so thought Id post my concerns here. These sort of ramp up from the basic but important (only about 40% of the texts they are using are actually fictional stories) to the big one that ties back into Jockerss original work; why use sentiment analysis at all? This leads back into a sort of defense of my method of topic trajectories for describing plots and some bigger requests for others working in the field.

Basic methodology

  1. They use some unspecified mechanism to limit the ~50,000 books in project Gutenberg to 1,700 stories or works of fiction. They use these terms interchangeably. But this suffers 2 problems.

1a. First, whatever fiction/nonfiction classifier they use seems to work extraordinarily poorlyalmost certainly worse than simply using the Library of Congress classifications that Gutenberg itself distributes with some of its dumps. It includes personal narratives, political essays, instructions for building bird houses, psychology texts, and so forth. If you click the random button on their page (which is a great thing to include), youll see many of these.

2a. It also includes many collections of short stories or pairs of novels published in a single volume. Some of these are the highest scoring plots for their basic arcs: for instance, The Wonder Book of Bible Stories is the best instance of the inverse of plot 3, and only 1 of the top 5 representatives of (- SV 2) seems to actually be a single narrative.

I ran a spot check of 50 random texts in their browser. I counted 18 non-fiction; 20 novels or short stories; and 12 collections of short stories, or other multi-work texts. So roughly 40% of the texts used are actually what the authors say they are. This makes the conclusions only provisional at best. So many of the titles in the captions are obviously not stories that its a little baffling they didnt bother to clean up their data set, or use one of the many *actual* fiction collections out there. [Edit: I noticed in the appendix that they classify fiction on the basis of length and download count. How they chose the parameters they use arent clear to me; in any case, its obvious that just length and download count are *terrible* inputs into a fiction/nonfiction classifier, so its no wonder they do so poorly.]

  1. The null hypothesis that they test against is word salad; a completely reshuffled set of orders. They do indeed seem to show that their stories have stronger shapes than word salads. But this is an extremely weak finding. Its akin to saying that you can predict the stock market because you can show that stock prices exhibit greater regularity than random digits. Of course stocks are not random dice rolls every second; they have a trajectory that they move from randomly. But for a time series like this, I think the null hypothesis should be at the least a random walk, not complete random words: that is, particularly when using normalized scores as here, the assumption should be that any given paragraph has the same emotional valence as the previous paragraph, not a completely new one. That is to say, it is easy to generative a null narrative that is distinct from a null text. This is not to say that there isnt some benefit to checking the weaker null hypothesis first. [Although see below in the comments: Scott Enderle suggests that the random noise they get shouldnt be producing results like it is. So whats going on is yet more unclear.]

Another question about plot as a time series is: can you predict what will happen? No one working in the field, to my knowledge, has tried to do this, but it could be interesting. In terms of emotional valences, this makes clear, I think, why the word salad null hypothesis is silly; if you want to predict the end of the book from the middle and beginning, you could do better than say It will start randomly vacillating every word from negative to positive and so on.

  1. They decide to test success by number of downloads, and argue that shapes (SV 3) and (-SV 3) are most successful because they have markedly higher downloads, and somewhat higher variance. The designation as higher is based entirely on mean downloads, since the medians are roughly the same. If both mean and median dont tell the story, theres probably something else going on. Maybe theres simply more variance, for example, and the number of downloads varies log-normally. When the summary statistics dont agree, its a stretch to claim any actual conclusions.

What are we doing here?
Next on to some bigger questions of what it means to study plot. This and Jockers are two of the more prominent things recently using sentiment analysis as a proxy for plot. I saw Ted Underwood on Twitter arguing that the next step must be following up on David Bammans work on experimenting on whether sentiment analysis actually works by using Mechanical Turk to annotate the emotional trajectory of texts.

Im basically done thinking about all of these; the combination of my paper on topic-modeling arcs and my meta-reflections on algorithms, plots, and the Jockers-Swafford affair of 2015 for Debates in the Digital Humanities 2016 give most of what I have to say formally about the issue. There are some slides from the IEEE paper that have nice interactives about the beginnings and ends of TV shows. But I thought Id just blog out a few additional directions Id like to see followed up.

I have some issues with the idea of validating sentiment analysis results being especially useful for literary analysis, principally because I dont think that even perfectly working sentiment analysis would be a very good way to measure plot. Citing Vonnegut is a bit of a bait-and-switch; he writes about good fortune and ill fortune, not positive sentiment and negative sentiment. Sentiment analysis is already trained on large numbers of human samples of whether something is positive or negative; if we want to explicitly test Vonneguts hypothesis, we ought to be building new models that classify text as fortunate and misfortunate, which should subtly differ from positive sentiment and negative sentiment.

Or we should be testing theories of plot that, unlike Vonneguts, actually have any influence beyond a web video from a few years ago. (Vonnegut doesnt even strike me as a writer who was especially good at plot, to be honest.) Train an LTSM model on human-tagged data that can accurately extract the call to adventure or the reaching of the innermost cave  from a script, and then we might have something interesting, because theres a real interplay between the stories we consume through mass media and popularizations of Joseph Campbell.

Of course that brings me to the final problem here, which is that you *cant* use mechanical turk to label stories by their Cambpellian archetypes because ordinary readers dont speak in those terms. Is that a problem? Can we expect to find structures that most people wouldnt recognize?

Ive said before that I think formal musical analysis is the real place to look here. One could, I imagine, try to classify every Beethoven sonata movement by its emotional trajectory; in some popular understanding of music that is what actually happens. But if macro-musicologists tried to do that, theyd obviously be missing out on the actual formal elements the composer was working with. Early 19th-century European music is organized tonally; a good model of its structure would look at tonal organization, not some nebulous notion of emotionality.

I do not believe there are general story principles as firm as classical-era sonata form. I do think that some combination of Joseph Campbell, commercial organization, and three-act structure conformism leaves contemporary television and movies somewhat predisposed to one or a few narratives that could be usefully explored. Which is why I think its a huge strategic blunder for everyone working with plots to be looking at novelsprobably the least coherent narrative form in existenceinstead of any of the many other forms of narrative out there.

Even if there are master plots, I suspect they will be revealed as much in terms of tension as emotion. (Tension is also more easily analogized to classical form music, for better or worse, as dominant-tonic relationships.) A plot classifier shouldnt be looking at local emotion; it should be looking at arcs of introduction of tension and release. This requires a very different form of machine reading; every gun on every mantlepiece needs to be tracked until it goes off. (As with everything else these days, this seems structurally better suited for neural networks than the locally tokenized texts were mostly working with.) Tension explains a wide variety of plots that none of the emotionally based mechanisms can. For example, the preponderance of plots in my TV and movie database are procedurals which are not organized around a single characters rise and fall; instead, they proceed from crime to punishment, from disease to cure, or from acquisition to sale.

I have no idea how to define tension. You could do it through Mechanical Turk, I guess. But whats really interesting is that we may be able to define it operationally. What sort of events in texts demand resolutions? What distinguishes beginnings from ends? These are more unsupervised questions than ones about emotional trajectories, and ones that might provide us with much more interesting questions to build on as well as answers.

Comments:

These are great reflections, and I agree with pret

Ted Underwood - Jul 1, 2016

These are great reflections, and I agree with pretty much everything. To clarify: when I say we need more evidence about human reactions, I dont necessarily mean that we need to validate the sentiment analysis. Thats one piece of it a piece David Bamman covered well. But I agree that the bigger question is, How much does sentiment actually tell us about plot?

I want evidence about human responses in the form of genre or popularity or *something* mainly in order to address that part of the question, which you rightly identify as crucial. For instance, when I casually tried to use sentiment trajectories to distinguish Shakespearean tragedies and comedies, I didnt get significant results. Thats a striking null. A sentiment-based method ought to be able to distinguish those two genres if it can distinguish *anything.*

There are lots of opportunities for further research here.

Ah, responded to this already on Twitter. Treatin

Ben - Jul 1, 2016

Ah, responded to this already on Twitter.

Treating this as a prediction problem would be one way to get at it. If someone were to say: which elements of a plot are predictive of it ending happily? Sadly? Then we might get away from treating plot time as one big mass of equally important stuff and finding which inflection points matter. Even sticking with sentiment analysis, I find it really unlikely that spreading sentiment over 100% really works well. These techniques make a huge distinction between a man in hole where he falls in the hole at 40% of the way through, and 60% of the way through. Something less rigid make work better; predicting the end state would be a good start.

In the musical analogy, these models are all getting hung up on the modulations in the development. But if you listen to the exposition of a sonata, you know what the key signatures and thematic groups in the recapitulation are going to be, give or take a theme.

Likewise anyone who reads the first third of Pride and Prejudice knows what the two major couples in the end are going to be.

Hi Ben, Im sympathetic to your skepticism ab

Bill Benzon - Jul 1, 2016

Hi Ben,

Im sympathetic to your skepticism about sentiment analysis >> plot and am sympathetic to your nod to musical analysis. And Ive got a very specific interest which Ive been calling ring-form composition, because thats the standard name. But Im now thinking of it as ring-form rhetorical structure for reasons that will emerge soon enough.

Whats ring-form? A text with a linear structure like this: A B CXC B A. Theres a structural center and the other segments are such that the second half is a mirror of the first.

I got interested in ring-form in email discussions with Mary Douglas, the anthropologist. Toward the end of her career she got interested in the Old Testment, which is one of the areas where ring-form has traditionally been studied (Homeric epic is another). She ended up writing books on Numbers and on Leviticus in which she argued/demonstrated each is ring-form. Then she gave a series of lectures at Yale (published as Thinking in Circles) in which she laid down some rules of thumb about the form and argued, in one chapter, that Tristram Shandy exhibits ring-form.

Meanwhile Id been finding ring-form in various texts: Osamu Tezukas manga, Metropolis; the Nutcracker Suite, Sorcerers Apprentice, and Pastoral Symphony episodes of Disneys Fantasia; Conrads Heart of Darkness; Coppolas Apocalypse Now (loosely based on Heart of Darkness); the 1954 Japanase film, Gojira (mangled and deformed into Godzilla, King of the Monsters for an American audience); a few lyric poems; Obamas eulogy for Clementa Pinckney; and, of all things, Ali Lius PMLA essay on meaning in DH. Thats quite a variety of texts, not all of them narratives, and at least one not even artistic (Lius essay). So, its a rhetorical form. But where it informs a narrative, as it does in some of these instances, you identify the form with reference to what we ordinarily think of as the plot.

[continued in next comment]

[continued from previous comment] Now, for Obama&

Bill Benzon - Jul 1, 2016

[continued from previous comment]

Now, for Obamas eulogy for Clementa Pinckney. Its a sermon. What tipped me to the ring-form is that the word grace first appeared roughly mid-way in the sermon and then kept on to the end, where he hammered it and then segued into Amazing Grace. That told me it broke into two parts. Once I knew that, identifying the symmetry was not too difficult.

Weve got a video of the eulogy. Obama was working in a tradition where audience response is important. That response, of course, is in the video. If we graphed the amplitude of the sound level wed have a crude sentiment analysis. To my ear theres a noticeable increase at the structural center, but that pales in comparison to the climactic ending.

. Id love a computational routine that would be able to pick out ring-forms in actual narratives. But Im skeptical. Would a sentiment analysis of Heart of Darkness identify the structural center? I dont know and really have no way of guessing. I can tell you that the structural center occurs in the longest paragraph in the text, but I dont think that that would be a generally useful clue to much of anything. It just happens to be an interesting feature of this rather peculiar text.

I have no general conclusions to offer. This is just stuff that Ive somewhat laboriously managed to dig up over the years. Ive got a bunch of posts on ring-form at my blog, New Savanna but this post lists most of the best ones along with some explanatory text: Literary Studies from a Martian Point of View: An Open Letter to Charlie Altieri. Heres a working paper that gathers some of them into a PDF: Ring Composition: Some Notes on a Particular Literary Morphology. Heres the Obama stuff: Obamas Eulogy for Clementa Pinckney: Technics of Power and Grace.

Interesting. The musical equivalent, you probably

Ben - Jul 1, 2016

Interesting. The musical equivalent, you probably know, is the arch form thats particularly associated with Bartok. I seem to recall some rondo forms that are something similar with a filler recap (eg, ABACADACABA), but cant immediately come up with an example.

Whats important here of course is *local* similarities rather than overall structures. It reminds me in a lot of ways of Andrew Pipers work on conversional narratives (http://txtlab.org/?p=459), which is an entirely different form but might be computationally tracked in similar ways. The big challenge with a ring/arch form, though, is sectional boundaries. Again theres some good computational musicological models out there, potentially, though prose echoes in a modified section are much less than strong than poetic ones.

Marking boundaries, yes. Criterion #4 of the 7 Mar

Bill Benzon - Jul 1, 2016

Marking boundaries, yes. Criterion #4 of the 7 Mary Douglas listed: Indicators to mark individual sections.

My concerns about this go even deeper, and I think

Scott - Jul 1, 2016

My concerns about this go even deeper, and I think basically invalidate the entire project. I think the results that SVD gives here strongly suggest that these sentiment patterns are just random walks, plain and simple Brownian noise.

The results Regan et. al. get are very strange. Entirely reshuffling a random walk gives white noise, which yields chaotic SVD eigenfunctions. They say they used random permutations to generate word salad, but their SVDs look like Brownian noise. I was only able to get white noise to look like Brownian noise with heavy filtering. (I used a few different constant averages to do that.)

If it were really white noise, the SVD eigenfunctions would also look like white noise. And if it were Brownian noise with hidden structure which seems to be the fundamental claim then we should expect the SVD to pick up on that hidden structure. Ive been investigating this for the last few days; heres a notebook illustrating the results:

https://github.com/senderle/svd-noise/blob/master/Noise.ipynb

The last example shows what the SVD looks like when theres hidden regularity in the data. Its very obvious. The hidden regularity would have to be extremely subtle given the results in Reagan et. al.

Following up I think this reiterates your point

Scott - Jul 1, 2016

Following up I think this reiterates your point, Ben, about smaller local regularities. But I suspect even smaller local regularities might show up in the SVD. Im also looking into autocorrelation as a way of picking up those regularities.

Well, if it *is* Brownian noise they test against,

Ben - Jul 1, 2016

Well, if it *is* Brownian noise they test against, thats better from my perspective. (Though if its actually just that they cant make something random when trying, thats not great).

I dont know if it totally invalidatesthe results here remind me of what I found in using PCA on topic space, where you also get harmonic curves but more strongly than when working with random data. notebook. To say that something is more harmonic than brownian noise is not nothing. And why shouldnt some harmonic functions be the base elements of sentiment trajectories? They still may be useful.

But yeah, all those high-order vacillations do seem to indicate a whole lot of nothing to me. I dont want to get too far into reiterating the debates of last year, though.

The presence of so many non-narrative works in the test set, of course, also makes it possible their method is good and that theyre mostly losing by testing on junk.

Since were talking Fourier analysis and harmo

Bill Benzon - Jul 1, 2016

Since were talking Fourier analysis and harmonic curves Ill offer some remarks specifically about Heart of Darkness. As you know Kurtz is a central character, but he doesnt appear until a ways into the text. So I got curious about his presence in the text. I divided the text into equal sized bins of 500 words, counted the occurence of Kurtz in each bin and graphed the results. (Seems Id done whats called a periodogram.) It turns out that Kurtz appears in the text of Heart of Darkness at periodic intervals. There is a short cycle of roughly 2000 words and a longer one that divides the text into four sections: an initial section with no appearances, a second section with relatively low activity, and a third section with more activity. As a check, I repeated the procedure with bins of 600 words.

I wonder what other periodicity wed find in this text. And other texts? And if we ran Jockers procedure on HoD?

Ive written this up in a short working paper: Periodicity in Heart of Darkness.

Approaching this predictively is a great idea. So&

Ted Underwood - Jul 1, 2016

Approaching this predictively is a great idea. Sos your idea about tension.

My gut feeling is that theres room to do something really interesting with plot, but its going to be slow initially. Its one of those projects where you could work for a year on cutting-edge NLP in order to get results that literary critics would look at and say yeah, we knew that.

Right now Im working on some issues related to time (duration) in fiction. Scene vs. summary. Not the same thing as plot, but I hope its a little more tractable.

Not sure just what you mean by scene, Ted, but whe

Bill Benzon - Jul 1, 2016

Not sure just what you mean by scene, Ted, but when I found ring composition in Metropolis and Gojiro the physical setting of the action was the main clue.

Also, I really agree that youre right that it

Ted Underwood - Jul 1, 2016

Also, I really agree that youre right that its a strategic blunder to start with novels, because Vonnegut may well be *wrong* about novels.

In any case, assuming hes right is not a safe starting point. I tried to say a version of that last year https://tedunderwood.com/2015/04/01/free-research-question-about-plot/.

It looked more to me like the sentiment data was j

Scott - Jul 2, 2016

It looked more to me like the sentiment data was just as harmonic as Brownian noise, and that the word salad data was also just as harmonic as Brownian noise, but with a smaller amplitude. If you look at the singular value graphs, youll see that they have the same shape its just that one is from a lower power signal. Thats just a scaling effect. You can get it just by dividing the original signal by a constant.

So far I have been able to reproduce every result they describe in that paper using noise generators alone. (However, I still need to try hierarchical clustering, and I dont grok the SOM stuff yet.)

The key point for me is that if there were clearly large-scale, deterministic patterns hidden in the noise, the most significant eigenfunctions produced by SVD probably wouldnt look like sine waves at all. The wikipedia article on the KarhunenLoève theorem theorem is instructive here.

Thats definitely possible: I wasnt looki

Ben - Jul 2, 2016

Thats definitely possible: I wasnt looking at amplitudes, and I have a particular brief here. Since I was using PCA on the assumptions they would extract arc shapes for this last year and before, and the validity of what I said depended entirely on amplitude/scree plot shape (are those functionally the same here?) So when you say eigenfunctions produced by SVD probably wouldnt look like sine waves at all its worth saying, maybe, But not definitely! After all, Jockers chose Fourier transforms in part, I think, because he thought plots *would* be on some level harmonic.

If I can make a guess: the SOM looks like the thing they *actually* wanted to use to identify basic stories, because it has the wow factor of a neural network learning to read stories. But the shapes it puts out are not very good: significant overlap between trends, some essentially flat lines in the top 6, etc; so they went with SVD for the headline instead. SVD, at least, will never say that one of the the most common story structures is nothing happens. While SOM can and will.

You know, Ted, I cant say Im surprised t

Bill Benzon - Jul 2, 2016

You know, Ted, I cant say Im surprised that you didnt get significant results when you tried sentiment analysis on Shakespearean comedies and tragedies. Theres a remark in Fryes Anatomy of Criticism to the effect that a tragedy is a truncated comedy. Now he was analyzing comedy as a three part structure where theres a middle section in which the social order weakened and the protagonist is in deep trouble. But then, theres a reconciliation in the third section and alls well the ends well. To create a tragedy, just drop that third move.

FWIW, when I read Romeo and Juliet in graduate school it felt like a comedy for at least the first half or so. And Shakespeare wrote it at a period when he was writing mostly comedies and histories. That is, he wrote it before hed moved on to mostly tragedies.

Ive done a lot of thinking about Much Ado About Nothing (a comedy), Othello (a tragedy) and The Winters Tale (a romance or tragi-comedy). All three plays center on a couple where the man is deceived into thinking his beloved has been unfaithful. In the comedy the deception happens before the marriage takes place. In the tragedy the deception takes place between the marriage and the consummation of the marriage. In the romance the deception takes place well into the marriage where there is a six-year old son and another child on the way. Theres a pile of other features that fit into a rather elegant pattern across the three plays (and Ive published on this, but Im embarrassed at always citing my own work on this or that, so I wont).

Now, as a critic whos looked quite carefully at a number of plots in a number of texts, Id love to be able to see a sentiment analysis in relation to plot. But Im not even sure what a representation of plot would look like in this domain.

You know theres that weird and wonderful essay by Steve Ramsay in which he manages to more or less classify Shakespeare plays into the standard genres by looking at how the action moves from one physical setting to another. WTF! Do we have a clue about whats going on? Thats plot in a rather rarified way, with nothing directly about what actually happens. Its just that something happens somewhere, and then something else happens in a different place. And yet, in two of my ring-form texts (Metropolis and Gojira) the form can (mostly) be identified by noticing how the action moves from one place to another. (Does this have anything to do with Morettis maps?)

Theres Vladimir Propp style plot functions. Do we want to identify them algorithmically and constitute plot as a sequence of such things? I dont know, but color me skeptical. Could we start by differentiating between a happy and a sad ending? Well, if sentiment analysis couldnt discriminate between Shakespearean tragedy and comedy, maybe not.

So yes, there are lots of opportunities for further research here. But what are they?

But not definitely! indeed! I had not entirely

Scott - Jul 3, 2016

But not definitely! indeed!

I had not entirely put it together that your SVD method from last year might overlap with theirs. And the 2D case may well be more complex.

There are just so many problems with that paper. Collections of short stories, poetry, nonfiction throughout and as Ted said, novels would already be hard. And so on. It maddens me to see that it has given this X fundamental plots meme unholy new life.