Debriefing on data journalism

At the Globe, Todd Wallack talked about getting three stories out of his work on parking tickets.

You're an editor at the Boston Globe, and you've just read Underwood & English on "Shifting Scales" and Michel et al. on "Quantitative Analysis of Culture." Each describes many different small findings.

Out of these two articles come up with two headlines: the story you think is the most interesting for you personally, and the one you think is the most interesting for the general public. Why are these the most interesting stories out of the many ones told?

Your headlines DO NOT need to be from different articles; they MUST cover only a fraction of the article (a single finding or maybe two).

Vote on a single headline.

What is textual data?

  • How do you turn text into data?

Hand-coded textual data

Global Terrorism Database

Hand coded data: read every newspaper article

(Some example criteria from the global terrorism database)

  • The violent act was aimed at attaining a political, economic, religious, or social goal;
  • The violent act included evidence of an intention to coerce, intimidate, or convey some other message to a larger audience (or audiences) other than the immediate victims; and
  • The violent act was outside the precepts of International Humanitarian Law

Textual "big data"

  • Unstructured
  • Large-scale
  • Similar enough to other data that you can use the same rules to process.
  • Similar enough to other data that you can use the same rules to model it.
  • (Often) of unknown size: a collection of texts that can keep growing.

Advantages of "big data"

  1. You can study anything on the Internet!
  2. You can fill in known patterns and project them into the future in more detail.

Predicting election results from Twitter

Pitfalls of textual "big data"

  1. Do you know what you're reading?
  2. With enough data, you need to know your errors.

Anachronisms

Mad Men in the early 1960s

The real Lincoln

Lincoln Anachronisms

LINCOLN

What's your name?

SECOND PATIENT

**Kevin.**

Anachronisms

want to show the amendment has bipartisan support, you idiot.

The war will take our son! A sniper, or a shrapnel shell! Or typhus, same as took Willie, it takes hundreds of boys a day!

How much fun?

Optical Character Recognition

If It changes more than it should?

Ngrams: Zip Codes

02138

Digitizing

Book scanning

Fingers

Hands

Unscannable books

Anachronisms