In-browser library classification



This is a web application that can classify any arbitrary text into Library of Congress subclasses or some other library classification schemes. As you wait, your browser will download a neural network trained using Google's TensorFlow; you can then paste any text into the box below, and click "classify" to see what subject areas the classifier thinks it belongs to. Your computer will locally process the text into an SRP projection that the neural network can read, and then infer a number of classes. The numbers are expressed as certainties; there will be a guess for any text, regardless of whether it resembles a library book or not.

Click one of these buttons to load a different model.





For instance, the default text is a segment from Huckleberry Finn, which the browser doesn't recognize as American Literature; instead it gives the books original classification PZ, "Fiction and Juvenile belles lettres." Neuralnetworks have a reputation for model uninterpretability, which is only partially fair; although the *model* as a whole is uninterpretable, individual decisions can be explained. If you click "Introspect Model," you can see an explanation of why the network made the decision that it did; the first pair of columns explains why it thought the top answer was plausible at all (or what the strongest negative scores were), while the second two give show it chose the answer it did relative to the other top two or three contenders. (In a multiclass problem like this, both are important: a word like "Paris" might increase the probability of something both American History compared to, say, aluminum processing; but decrease the chance of it being American history compared to French history.) These numbers are inferred by re-running the network many times, each time with a single word dropped out, to see how the output weights differ. They give some sense of what vocabulary the neural network has "learned," although some results may simply be random noise.