Hansard Dec 14 2015

A first pass at understanding the potential of the Hansard corpus through a Bookworm browser.

I’ve divided up the native XML by using the intrinsic speaker tag into a variety of individual speeches.

A “speech” can be very short; on average, each one in the Hansard corpus is 225 words.