Ben Schmidt
2020-03-12
Slides:
benschmidt.org/slides/hathidy
Package:
https://github.com/HumanitiesDataAnalysis/hathidy
Extracted Features
General vision: wordcount data should be able to meet scholars where they are.
Teaching with HTRC features
Principle: counting, joining, and modeling are transferable skills that can be used on any data set.
So students need their own sets. Thus: 🐘
HTRC Feature Reader (python)
pandas
integrationHathidy (R)
tidyverse
integrationHTRC Feature Reader (python)
Hathidy (R)
Core principles for working with extracted features.
Fast access means fast to code and fast to load.
Uniform model; you must access by HTID. The user is only dimly aware there are files involved.
But there are; and they are cached on disk.
Currently using pairtree and csv; but that is likely to change to flat and parquet.