Big Data and the Sciences

2020-05-04

Big Data and the Server Age

How is Big Data different from data?

  • There’s more of it.
  • It lacks structure.
  • It lacks statistical sampling.

???

From http://ai.stanford.edu/~ang/papers/icml12-HighLevelFeaturesUsingUnsupervisedLearning.pdf

???

Try 2

  • Supervised: using training data with labels
  • Unsupervised Learning: Attempting to learn intrinsic patterns.

With Supervision

Big Data is the precondition for modern “Machine Learning.”

  • 1990-2010: Large datasets built, widespread use of various algorithms for prediction.
  • 2013-present: Widespread use of neural networks without backpropagation.

Big Data before AI

Large Hadron Collider

Private-public rivals; Craig Venter, Celera Genetics.

Ventner’s method

Source-Adam Kucharski, www.sbs.com.au

Netflix Prize

Netflix’s goal, c. 2014

For DVDs our goal is to help people fill their queue with titles to receive in the mail over the coming days and weeks; selection is distant in time from viewing, people select carefully because exchanging a DVD for another takes more than a day, and we get no feedback during viewing. For streaming members are looking for something great to watch right now; they can sample a few videos before settling on one, they can consume several in one session, and we can observe viewing statistics such as whether a video was watched fully or only partially.

Server Farms

Street view of data center

Youtube…