Biocurious is a weblog about biology, quantified.

Bill Bialek on theory in the age of Big Data

by Andre on 29 September 2014

One of the most talked about Big Data projects in the last year or so has been the BRAIN initiative in the US. It was prompted by the incredible advances that are being made in technologies to image and manipulate neural activity optically. In its early incarnation, the goal was to record the activity of every neuron in a brain over time. With Billions and Billions of neurons, that would most assuredly lead to a big pile of data.

In this context, Bill Bialek was invited to give some opening remarks on the second day of the Kavli/NSF symposium on the BRAIN initiative. Thankfully, they’ve made the video available, and it’s one of the most lucid expositions of the essential role that theory still plays in the age of Big Data that I’ve seen. He focusses on the brain and behaviour, but you could substitute cell biology and ‘omics without changing the message much.

Highlights from the talk:

Data mining [is] popular. But miners know gold when the see it. Theory is the source of explicit, testable hypotheses about what is golden in your data.

He goes on to make an important related point: even the algorithms you choose to apply represent implicit hypotheses about the data and strongly shape what you will find and how you will be able to interpret it.

If the goal is to explain behaviour in terms of neurons, synapses, molecules… we run the risk that the ingredients of explanation will outstrip the phenomena we are trying to explain.

So we need better quantitative characterisations of behaviour and better theories of what brains do.

Suppose I showed you a movie of what 10 000 water molecules are doing as the wander around in that liquid. I do not believe that by staring at hours of that movie you would ever induce the concept of wet. You can’t just look! It doesn’t work.

Bigger data will never solve this.

