Commodius vici of recirculation: the real problem with Syuzhet

Practically everyone in Digital Humanities has been posting increasingly epistemological reflections on Matt Jockers’ Syuzhet package since Annie Swafford posted a set of critiques of its assumptions. I’ve been drafting and redrafting one myself. One of the major reasons I haven’t is that the obligatory list of links keeps growing. Suffice it to say that this here is not a broad methodological disputation, but rather a single idea crystallized after reading Scott Enderle on “sine waves of sentiment.” I’ll say what this all means for the epistemology of the Digital Humanities in a different post, to the extent that that’s helpful.

Here I want to say something much more specific: that Fourier transforms are the wrong “smoothing function” (insofar as that is the appropriate term to use) to choose for plots, because they assume plot arcs are periodic functions in which the beginning must align with the end. I’m pretty sure I’m right about this, but as usual I’m relying on an intuitive understanding of the techniques under discussion here rather than a deeply mathematical one. So let me know if I’m making a total ass of myself, and I’ll withdraw my statements here.

Even before Swafford posted her critique, I felt like there was something quite wrong about using the Fourier transform as a “smoothing” mechanism. Fourier transforms, in my experience with them, are bad at dealing with humanities data, because they rely on a very precise definition of “signal.” I’ve had to use wavelets instead of the Fourier transform in the past even to extract obviously periodic data from time series, because the assumptions of regularity in the fourier transform are so strong that some periods are simply missed.

As I was reading Enderle’s post, it occurred to me that we’ve been graphing these fourier transformed waves with the x axis reading 1 to 100, as if it was a closed domain. But, in fact, if plot is a sum of sine waves, that domain should actually read from 0 to 2*pi. (Or, if you’re so inclined, from 0 to tau). The difference being that waveforms are cyclical: this is the fundamental assumption of fourier transforms, whence all of the ringing artifacts that Swafford usefully points out come. After 100 comes 101: but 2 pi is the same as zero. This assumption is true only for novels whose last sentence is aligned to feed back into their first, a rare breed indeed. (Although ironically, given the primacy that Portrait of the Artist has played in this debate, Joyce wrote one.)

To put that graphically: this cyclicality means that syuzhet imposes an assumption that the start of plot lines up with the end of a plot. If you generate an artificial plot that starts with sentiment “-5” and ends with sentiment “5”, it looks like this with normal smoothing methods. (Rolling average or loess).

 

Screen Shot 2015-04-03 at 11.52.25 AM

 

 

But if you try to use syuzhet’s filter, it comes up looking completely different: wavy.

Screen Shot 2015-04-03 at 11.47.38 AM

 

This holds true on real documents. I ran it on every state of the union address since 1960. I’ve added dashed lines to show the overall sentiment movement in the address. Blue shows loess smoothing from beginning to end, and red shows the fourier transform. As you can see, loess allows plots to get happier or sadder: fourier forces them to return almost to their starting place.

All the code for this is online here: you can try it on your own plots as desired.

Screen Shot 2015-04-03 at 11.55.30 AM

 

 

I can see no sound reason to do this. Plots can start sad and get happy. But if you look at Jockers’ six “fundamental plots,” all start and end in the same approximate emotional register. This, I think, is an artifact of the assumptions of periodicity built into the Fourier transform, not the underlying plots. There’s no room in this world for Vonnegut’s “From bad to worse,” or for any sort of rags to riches. It treats plot as a zero-sum game.

If I’m not misunderstanding something here, this should convince Jockers to retire the waveform assumptions in favor of something like Loess smoothing or moving averages, so digital humanists can move on to talking about something other than “ringing artifacts.” I don’t think this devastating for the Syuzhet package as a whole: it has absolutely nothing to do with the suitability of sentiment analysis for determining plot, which is a much more interesting question others are contributing to. (I am still undecided whether I think my own method of plotting arcs through multidimensional topic spaces, which I originally came up from my misunderstanding something Jockers said to me a year ago about his idea for syuzhet, is better: I do think it adds something to the conversation.) One of the broader points my unfinished post makes is that we shouldn’t be taking failures in one component of a chain to mean the rest is unsound: that’s an oddly out-of-domain application of falsifiability.

 

 

3 thoughts on “Commodius vici of recirculation: the real problem with Syuzhet

  1. Ted Underwood

    Nice. I tried the ascending line, too, and noticed it was weirdly deformed, but didn’t notice the cyclicity. I’m Mr. Agnostic lately, and in that persona I guess I’m required to observe that I can imagine someone who actually believed all plots are cyclic. The notion of the “plot arc” you were drawing on is loosely analogous — though, only loosely, because it’s really only positing that beginnings and ends are similar qua not-being-middles. Not the same thing as positing that they have the same sentiment polarity.

    That would be a very strong prior to impose! It’s definitely not a prior I would recommend using for exploration, without more evidence.

    Reply
    1. Scott

      I, for one, will not stop believing in Mr. Agnostic. And something struck me last night as I was drifting off. What other kinds of assumptions could we credibly make about the beginning and ending of a story? They probably won’t both have the same altitude, but might they not have the same slope?

      Reply
  2. ben Post author

    Even an agnostic can be convinced that certain particular gods don’t exist! There are certainly some domains where the ends must recreate the same fundamental configuration as the beginning in certain ways. (The network situation comedy designed for syndication in terms of character relations; the classical sonata in terms of key.) But put in terms of “positive/negative sentiment” or “fortune/misfortune,” it’s obviously not true for the novel. Anyone who said it was would just be regarded as bonkers. (And maybe, say, kicked out of his master’s program.)

    You’re right that it’s not dissimilar from my plot arcs, and I find this a useful way to think through some of the issues there. More to come. But plot arcs are about finding the principal distinguishing axes of plots, and then seeing what they are–it’s more straightforwardly exploratory, I’d say, because the PCA is only there to pull out the equivalent of a regression line of an undetermined sort. (I may just switch over to linear models to avoid the PCA horseshoe conversation). And of course what it finds is that the most important distinction is between the start and end of plots, and only when you factor that out is it possible to see what separates the ends from the middles. It would have been possible in theory for circular structures to emerge there, but they didn’t.

    Reply

Leave a Reply to ben Cancel reply

Your email address will not be published. Required fields are marked *