Posts with tag BookwormBack to all posts
It’s not very hard to get individual texts in digital form. But working with grad students in the humanities looking for large sets of texts to do analysis across, I find that larger corpora are so hodgepodge as to be almost completely unusable. For humanists and ordinary people to work with large textual collections, they need to be distributed in ways that are actually accessible, not just open access.
I mentioned earlier that I’ve been doing some work on the old Bookworm project as I see that there’s nothing else that occupies quite the same spot in the world of public- facing, nonconsumptive text tools.
I used to blog everything that I did about a project like Bookworm, but have got out of the habit. There are some useful changes coming through through the pipeline, so I thought I’d try to keep track of them, partly to update on some of the more widely used installations and partly
As I often do, I’m going to pull away from various forms of Internet reading/engagement through Lent. This year, this brings to mind one of my favorite stray observations about digital libraries that I’ve never posted anywhere.
Just some quick FAQs on my professor evaluations visualization: adding new ones to the front, so start with 1 if you want the important ones.
I promised Matt Jockers I’d put together a slightly longer explanation of the weird constraints I’ve imposed on myself for topic models in the Bookworm system, like those I used to look at the breakdown of typical TV show episode structures. So here they are.
I’ve been seeing how deeply we could integrate topic models into the underlying Bookworm architecture a bit lately.
I thought it would be worth documenting the difficulty (or lack of) in building a Bookworm on a small corpus: I’ve been reading too much lately about the Simpsons thanks to the FX marathon, so figured I’d spend a couple hours making it possible to check for changing language in the longest running TV show of all time.