But the third thing we need is we need more folks in engineering, math, science, technology, computer science. (Applause.) And that means we’ve got to have a school system generally that encourages those subjects. And, by the way, I was a political science and English major, and you need to know how to communicate, and I loved the liberal arts, so this is no offense, but we’ve got enough lawyers like me. We need more engineers. (Applause.) We need more scientists.
[@obama_remarks_2014]
Google Ngrams
Google Ngrams
Google Ngrams
Google Ngrams
Google Ngrams
Google Ngrams
Turning machine-readable books into machine-read books for classification.
Humans have rich, fuzzy understandings with allowance for uncertainty.
Computers force things into lifeless abstractions.
Humans, have rich, fuzzy understandings with allowance for uncertainty.
Bureaucracies force things into lifeless abstractions.
Computers (nowadays) have fairly rich, fuzzy understandings with allowance for uncertainty.
Prediction: Short, computer-readable embeddings of collections items will be an increasingly important shared resource for non-consumptive digital scholarship.
Rather than full text, a new method I’m calling “Stable Random Projection”:
Classifier suites:
Re-usable batch training code in TensorFlow.
One-hidden-layer neural networks can help transfer metadata between corpora.
Protocol: 90% training, 5% validation, 5% test.
Books only (no serials).
All languages at once.
Classifiers trained on Hathi metadata can predict:
Library of Congress Classification
Instances | Class name (randomly sampled from full population) |
---|---|
461 | AI [Periodical] Indexes |
6986 | BD Speculative philosophy |
9311 | BJ Ethics |
40335 | DC [History of] France - Andorra - Monaco |
2738 | DJ [History of the] Netherlands (Holland) |
14928 | G GEOGRAPHY. ANTHROPOLOGY. RECREATION [General class] |
17353 | HN Social history and conditions. Social problems. Social reform |
4703 | JV Colonies and colonization. Emigration and immigration. International migration |
23 | KB Religious law in general. Comparative religious law. Jurisprudence |
5583 | LD [Education:] Individual institutions - United States |
3496 | NX Arts in general |
6222 | PF West Germanic languages |
68144 | PG Slavic languages and literatures. Baltic languages. Albanian language |
157246 | PQ French literature - Italian literature - Spanish literature - Portuguese literature |
6863 | RJ Pediatrics |
Misclassifications
Misclassifications: mdp.39015005002905
Misclassifications: uva.x000423222
Misclassifications
Actual LC Classification: QB63.B5 1927
<record>
<leader>00820nam a22002291 4500</leader>
<controlfield tag="001">006496938</controlfield>
<controlfield tag="003">MiAaHDL</controlfield>
<controlfield tag="005">20130926000000.0</controlfield>
<controlfield tag="006">m d </controlfield>
<controlfield tag="007">cr bn ---auaua</controlfield>
<controlfield tag="008">880505s1927 ksu 00110 eng </controlfield>
<datafield tag="010" ind1=" " ind2=" ">
<subfield code="a"> 27024000</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">sdr-nrlfGLAD17073443-B</subfield>
</datafield>
<datafield tag="035" ind1=" " ind2=" ">
<subfield code="a">(OCoLC)6046903</subfield>
</datafield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">DLC</subfield>
<subfield code="c">OKN</subfield>
<subfield code="d">CUY</subfield>
<subfield code="d">ZEPHIR</subfield>
</datafield>
<datafield tag="050" ind1="0" ind2=" ">
<subfield code="a">QB63</subfield>
<subfield code="b">.B5 1927</subfield>
</datafield>
<datafield tag="090" ind1=" " ind2=" ">
<subfield code="a"> QR63</subfield>
<subfield code="b">.B5</subfield>
</datafield>
(etc...)
Actual LC Classification: QB63.B5 1927
Classifier online.
“Statistics on their own, enticing in their seeming neutrality, failed to address or unpack black life hidden behind the archetypes, caricatures, and nameless numbered registers of human property slave owners had left behind. And cliometricians failed to remove emotion from the discussion. Data without an accompanying humanistic analysis—an exploration of the world of the enslaved from their own perspective—served to further obscure the social and political realities of black diasporic life under slavery.”
‘Data is the evidence of terror, and the idea of data as fundamental and objective information, as Fogel and Engerman found, obscures rather than reveals the scene of the crime.’
More importantly, the lack of engagement with economic historians limited the analytical perspectives of each of these books. Most of them seem aware of Fogel and Engerman’s Time on the Cross (1974), and some repeat its arguments about the profitability of slavery or the efficiency of slave plantations. But they do not seem to have taken seriously the debates among economic historians that followed the publication of that book. Some […] challenged Fogel and Engerman[; but] analyzed slavery in new ways.
Hilt, Eric. “Economic History, Historical Analysis, and the ‘New History of Capitalism.’” The Journal of Economic History 77, no. 2 (June 2017).
In the past, historians and economists (sometimes working as a team) collectively advanced the understanding of slavery, southern development, and capitalism. There was a stimulating dialog. That intellectual exchange deteriorated in part because some economists produced increasingly technical work that was sometimes beyond the comprehension of many historians. Some historians were offended by some economists who overly flaunted their findings and methodologies.
Olmstead, Alan L., and Paul W. Rhode. “Cotton, Slavery, and the New History of Capitalism.” Explorations in Economic History 67 (January 1, 2018).
Ash, Chen, and Naidu 2018
Ash, Chen and Naidu 2018
We supplemented this list with exact years of attendance from Annual Reports obtained by filing FOIA requests and correspondence from the Law and Economics Center at George Mason University. Figure 1 plots the share of Circuit Court cases with a Manne Judge on the panel over time. As can be seen, by the late nineties, about half of cases were directly impacted by a Manne panelist.
Ash, Chen and Naidu 2018
This paper utilizes a dataset on all 380,000 cases (over a million judge votes) in Circuit Courts for 1891-2013, and a data set on one million criminal sentencing decisions in U.S. District Courts linked to judge identity (via FOIA request) for 1992-2011. We have detailed information on the judges and the metadata associated with the cases. In addition, we process the text of the written opinions to represent judge writing as a vector of phrase frequencies.
Census Atlases
Jim Vallandingham Census Bump Charts
In a recent bulletin of the Superintendent of the Census for 1890 appear these significant words: “Up to and including 1880 the country had a frontier of settlement, but at present the unsettled area has been so broken into by isolated bodies of settlement that there can hardly be said to be a frontier line. In the discussion of its extent, its westward movement, etc., it can not, therefore, any longer have a place in the census reports.” This brief official statement marks the closing of a great historic movement. Up to our own day American history has been in a large degree the history of the colonization of the Great West.