Humanities Crisis

But the third thing we need is we need more folks in engineering, math, science, technology, computer science. (Applause.) And that means we’ve got to have a school system generally that encourages those subjects. And, by the way, I was a political science and English major, and you need to know how to communicate, and I loved the liberal arts, so this is no offense, but we’ve got enough lawyers like me. We need more engineers. (Applause.) We need more scientists.

[@obama_remarks_2014]

Data sources

Google Ngrams

Google Ngrams

02138

02138

Google Ngrams

Google Ngrams

Google Ngrams

Google Ngrams

Gatsby

Gatsby

Making the Library Legible

Turning machine-readable books into machine-read books for classification.

Humans have rich, fuzzy understandings with allowance for uncertainty.

Computers force things into lifeless abstractions.

Humans, have rich, fuzzy understandings with allowance for uncertainty.

Bureaucracies force things into lifeless abstractions.

Computers (nowadays) have fairly rich, fuzzy understandings with allowance for uncertainty.

Prediction: Short, computer-readable embeddings of collections items will be an increasingly important shared resource for non-consumptive digital scholarship.

Rather than full text, a new method I’m calling “Stable Random Projection”:

  • Turn each book into 1280 numbers based on words
  • Random projection of log-word counts.
  • Unlike other dimensionality reductions, can work on all languages simultaneously.

Classifier suites:

  1. Re-usable batch training code in TensorFlow.

  2. One-hidden-layer neural networks can help transfer metadata between corpora.

  3. Protocol: 90% training, 5% validation, 5% test.

  4. Books only (no serials).

  5. All languages at once.

Classifiers trained on Hathi metadata can predict:

  1. Language
  2. Authorship on top 1,000 authors with > 95% accuracy. (Too good to be true)
  3. Presence of multiple subject heading components (eg: ‘650z: Canada– Quebec – Montreal’) with ~50% precision and ~30% recall.
  4. Year of publication for books with median errors ~ 4 years.

Library of Congress Classification

  • Shelf locations of books.
  • Widely used by research libraries in United States.
  • ~220 “subclasses” at first level of resolution.

Instances Class name (randomly sampled from full population)
461 AI [Periodical] Indexes
6986 BD Speculative philosophy
9311 BJ Ethics
40335 DC [History of] France - Andorra - Monaco
2738 DJ [History of the] Netherlands (Holland)
14928 G GEOGRAPHY. ANTHROPOLOGY. RECREATION [General class]
17353 HN Social history and conditions. Social problems. Social reform
4703 JV Colonies and colonization. Emigration and immigration. International migration
23 KB Religious law in general. Comparative religious law. Jurisprudence
5583 LD [Education:] Individual institutions - United States
3496 NX Arts in general
6222 PF West Germanic languages
68144 PG Slavic languages and literatures. Baltic languages. Albanian language
157246 PQ French literature - Italian literature - Spanish literature - Portuguese literature
6863 RJ Pediatrics

Chihuahua or Muffin

Misclassifications

Bacteriology 

  • Actual: HV: Social and Public Welfare/Criminology -> Welfare -> Protection, Assistance, Relief -> Special classes.
  • Algorithm: US Law.

Misclassifications: mdp.39015005002905

 

  • Actual LC Classification According to Hathi: AC 277 (Undefined)
  • Algorithm: DC (French History)
  • Shelf Location in Michigan: HC 277 (Economic History, France)

Misclassifications: uva.x000423222

 

  • Actual LC Classification: BF1613 Magic (White and Black). Occult Sciences -> Shamanism. Hermetics. Necromancy -> General Works, German, post-1800
  • Algorithm: BP: Theosophy, etc. BP595: Works by and about Rudolf Steiner.

Misclassifications

 

  • Actual LC Classification: QH1.A43 (Natural History)
  • Algorithm: QE (Geology)

Actual LC Classification: QB63.B5 1927

Bacteriology 

  • QB 63: Astronomy -> Stargazer’s guides.
  • QR 63: Microbiology -> Laboratory manuals

Actual LC Classification: QB63.B5 1927

Bacteriology 

  • BF 1611: Magic (White and Black). Shamanism. Hermetics. Necromancy -> General Works
  • Algorithm Says: HS
  • HS445.A2: Freemasons -> Masonic Law -> By Region or Country -> United States -> By State -> Constitutions.

Classifier online.

http://benschmidt.org/static_class/

Quantification

“Statistics on their own, enticing in their seeming neutrality, failed to address or unpack black life hidden behind the archetypes, caricatures, and nameless numbered registers of human property slave owners had left behind. And cliometricians failed to remove emotion from the discussion. Data without an accompanying humanistic analysis—an exploration of the world of the enslaved from their own perspective—served to further obscure the social and political realities of black diasporic life under slavery.”

  • Johnson, Jessica Marie. “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” Social Text 36, no. 4.

‘Data is the evidence of terror, and the idea of data as fundamental and objective information, as Fogel and Engerman found, obscures rather than reveals the scene of the crime.’

  • Johnson, Jessica Marie. “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” Social Text 36, no. 4.

More importantly, the lack of engagement with economic historians limited the analytical perspectives of each of these books. Most of them seem aware of Fogel and Engerman’s Time on the Cross (1974), and some repeat its arguments about the profitability of slavery or the efficiency of slave plantations. But they do not seem to have taken seriously the debates among economic historians that followed the publication of that book. Some […] challenged Fogel and Engerman[; but] analyzed slavery in new ways.

Hilt, Eric. “Economic History, Historical Analysis, and the ‘New History of Capitalism.’” The Journal of Economic History 77, no. 2 (June 2017).

In the past, historians and economists (sometimes working as a team) collectively advanced the understanding of slavery, southern development, and capitalism. There was a stimulating dialog. That intellectual exchange deteriorated in part because some economists produced increasingly technical work that was sometimes beyond the comprehension of many historians. Some historians were offended by some economists who overly flaunted their findings and methodologies.

Olmstead, Alan L., and Paul W. Rhode. “Cotton, Slavery, and the New History of Capitalism.” Explorations in Economic History 67 (January 1, 2018).


Ash, Chen, and Naidu 2018

Ash, Chen and Naidu 2018

We supplemented this list with exact years of attendance from Annual Reports obtained by filing FOIA requests and correspondence from the Law and Economics Center at George Mason University. Figure 1 plots the share of Circuit Court cases with a Manne Judge on the panel over time. As can be seen, by the late nineties, about half of cases were directly impacted by a Manne panelist.

Ash, Chen and Naidu 2018

This paper utilizes a dataset on all 380,000 cases (over a million judge votes) in Circuit Courts for 1891-2013, and a data set on one million criminal sentencing decisions in U.S. District Courts linked to judge identity (via FOIA request) for 1992-2011. We have detailed information on the judges and the metadata associated with the cases. In addition, we process the text of the written opinions to represent judge writing as a vector of phrase frequencies.

Scrollership

Hathi Features

Census Atlases

2007 Census Atlas reproductions

“Modern” Census Atlas by Nathan Yau

Brooklyn Brainery Christmas Gifts

Jim Vallandingham Census Bump Charts

National Museum of the American Indian, 2014

In a recent bulletin of the Superintendent of the Census for 1890 appear these significant words: “Up to and including 1880 the country had a frontier of settlement, but at present the unsettled area has been so broken into by isolated bodies of settlement that there can hardly be said to be a frontier line. In the discussion of its extent, its westward movement, etc., it can not, therefore, any longer have a place in the census reports.” This brief official statement marks the closing of a great historic movement. Up to our own day American history has been in a large degree the history of the colonization of the Great West.

Calculating the Center of New Jersey

Calculating the Center of New Jersey

Calculating centroids from paper

http://localhost:8085/Turner.html