Priors and Prejudice in Thinking Machines

February 27, 2016

I’d like to start this post with a demonstration about human learning and generalization. Below is a training set which has been grouped into YES and NO images. Look at these, then decide which image from the test set should also be labelled YES.

That shouldn’t have been too difficult. In fact, I suspect even those kids in the training set would notice that the YES instances all contain two of the same thing. One might assume that learning this categorisation would come naturally to a Convolutional Neural Network. All the better, one could even start with a pretrained model already able to recognise the main objects in an image, then simply train a couple more layers on top for the final YES / NO classification. In truth, the matter is more subtle.

The reason a neural network succeeds at object recognition is that we specifically architect it for the job, building in our prior knowledge to guide the class of functions it can learn. We constrain it to use convolutions - in essence, to look for structures which are built up compositionally, from parts and which maybe be seen in many different locations. Without these constraints on the kinds of patterns it should look for, the original AlexNet network would have totalled around 80 billion parameters and learned nothing. Adding a little prior knowledge about images into the network’s architecture is precisely what allows it to generalize from ‘only’ a thousand examples per class.

Unfortunately this particular architecture might not do so well at the task above, even if it were given a thousand YES / NO images to train its upper layers on. The convolutional structure is designed to learn about 2 dimensional arrangements of parts, but it has no baked-in preference for what those parts are or how they relate to each other. Specifically, there is nothing in its structure that makes sameness of parts a meaningful concept. Thus, this network could easily be trained to recognise the two ducks pattern or the two monkeys pattern, but still would have no reason to see a common property between them and generalize this to a new example of two otters.

If we wanted to train a neural network on the task above then we could easily modify the convolutional structure to understand ‘sameness’, but that is exactly my point. There is no such thing as general purpose induction, and if we want our models generalize from data like us then they need to have the same inductive biases we have. That’s difficult because we have an awful lot of them.

Evolution has not just endowed our brain with a stew 86 billion neurons (impressive, albeit three times less than an elephant’s). It has also shaped the architecture to an enormous degree so that, before even looking at the world, we come prepared with just the right structure for learning about certain things. Just as CNNs are born to learn 2-dimensional arrangements of parts, our brains seem innately primed for understanding such concepts as motion, faces, spatial layout of the environment, motor control, people and animals, speech, language, and even thinking about thinking.

In fact, an increasingly compelling body of neuroimaging studies have found brain regions specific to each of these, and some abilities such as face detection and imitation may even be present at birth. This of course does not discredit the idea that such abilities can be learned from enough data with fairly general purpose tools, but it does suggest that evolution may have given us a serious kick-start in these areas.

A ten-minute old infant imitates his father sticking out his tongue

If we are to create truly intelligent machines, we will need a way to build much richer structure into our models than is found in CNNs - the kind of structure that we are born with. This is a tremendously difficult problem which, I believe, will need to be attacked jointly from two sides:

1. Use our prior knowledge to design new model structures

This knowledge may come from our intuitions, or from research in cognitive science and neuroscience about how the brain solves certain problems. In either case, how we can bake such knowledge into our models is heavily shaped by the tools we use to build them.

One toolbox we have available includes probability models, programming languages, and especially probabilistic programming languages. These provide a powerful way to write down all of our prior knowledge in a high-level and expressive language, so that our models can generalize well from just a few examples. Brenden Lake’s Science Paper on one-shot learning is a terrific display of what probability models can achieve where deep learning struggles. Of course the difficulty with these models is that, as they grow in complexity, actually using them for inference can become prohibitively slow. Thus, finding fast approximate inference techniques is an important area of research - one with some exciting frontiers that I hope to explore in my next post.

Deep learning takes the opposite approach: the fast learning rule of stochastic gradient descent allows one to provide much weaker priors, so long as they are willing to make up for it with data. Unfortunately, this approach also makes it very difficult to include stronger prior knowledge when we'd like to. We are lucky that convolutions are such a natural way to provide networks some basic rules1 for image recognition (so natural that the idea is much older than I am), yet it is much less obvious what kind of architectural decisions might guide a network to understand objects in terms of moveable parts, joints and surfaces, for example. This difficulty of incorporating prior knowledge into neural networks is one of their biggest weaknesses.

2.  Build systems which discover structure from data

The probabilistic learning community has produced some wonderful research in this area. I have a particular fondness for Roger Grosse’s work on matrix decompositions, which can discover a wide range of model structures by composing simpler elements together. The Automatic Statistician takes a similar approach with Gaussian Process kernels, and uses this to produce beautiful reports of the patterns in any given dataset, in pdf format. Still, both systems are fairly coarse grained, making use of just a few large building blocks, and so are somewhat restricted in the type of model structures they can learn.

When the building blocks are more primitive this idea starts to looks more like ‘program synthesis’ - automatically learning the code of a (probabilistic) program from examples of desired output - and it typically involves a stochastic search not unlike the mechanism of evolution. This is very difficult to do well from scratch, but if enough prior knowledge of a program’s overall structure is included (a technique called sketching) then it is possible to successfully fill in the blanks.

To learn a similar structure for a neural network, one could again define a constrained set of architectures and search over this space. However, what I find much more interesting is a new direction of research, aiming to abstract away the structure of the computation itself.

The Neural Programmer-Interpreter is one such model. A recurrent neural network is given access to a persistent memory module and uses it to store 'programs' in a kind of learnable code. The network is then able to read these programs and execute them on new inputs, train them using supervision at various levels of abstraction, and compose them together into new larger programs. By encoding the computation this way, rather than as weights in a fixed graph, the network can learn not just the parameters of an algorithm but also its structure. In a strong sense this is the neural analog of program synthesis, and I find it an exciting direction.

How can we build a mind with the right structure to learn about the world? I'm sure this great challenge will require many breakthroughs over many decades, but I have a feeling that in the next few years we will finally have the right tools to at least get a handle on it: systems which can integrate our rich prior knowledge of the world when we provide it, while teaching themselves from data when we don’t. 


Formally, features learned by a CNN are translation equivariant. Cohen and Welling show that a generalisation of CNNs can provide equivariance to the larger group of translations, axis-aligned reflections and 90 degree rotations.


Deltaenterprisesviagra Buy

Deltaenterprisesviagra Buy Vardenafil 10mg Medrol Choisir Clomid Accutane Generic Cost Does Amoxicillin Cure Bladder Infections Pediatric Zithromax Dose Cheap Viagra Sales Keflex For Bronchitis Levitra Kaufen In Osterreich Cialis Professional Indian Pharm Inderal Propecia Frau cialis online Cialis Nebenwirkung Viagra Fa Male Al Cuore Dapoxetine Online Purchase Propecia Coste Farmacia Vente De Viagra Levitra Price Check How Do I Buy Viagra Online Spain Antibiotic Online Mexico Accutane Generics Elife Pharmacy Forum Levitra Brand Cialis Online Medicines From Mexico Amoxil Pas Cher Ordering Cialis Online Cialis Argentina Nebenwirkungen Viagra Sodbrennen Generic Viagra Buy Preisvergleich Cialis Tadalafil Worldwide Legally Levaquin Pneumonia Visa Accepted Medicine Kamagra Online Pharmacy Super P Force Us Le Cialis Forum Priligy Pastilals Prescription Medications Diflucan Cialis 10 Mg Forum Accutane Tablets Cheap Viagra Pe Is Canadian Health Care Mall Legitimate Propecia No Prescription Cephalexin Use Kamagra Jelly 100 Mg Toulon Comprar Propecia Propecia Dermatologists Priligy Walgreens Viagra Samples Cialis Contraindicaciones Alcohol Chlamydia Keflex viagra Sale Doxycycline Low Price Viagra With Ot Prisetion Cheapest Levitra Priligy 30 Mg Vademecum Direct Progesterone Medication Mastercard Pharmacy buy 10 mg levitra online Atomoxetine

Hello there, just became

Hello there, just became aware of your blog through Google, and found that it's truly informative.
I am gonna watch out for brussels. I'll appreciate if you continue
this in future. Lots of people will be benefited
from your writing. Cheers!

Here is my site manicure

Confirm that you will not

Confirm that you will not have every other accounts that happen to
be turned over to your collections agency. If you want to speak with an agent, call
the lender at 800-955-7070 and select "More options" to get to your "Credit Line Increase" option.

citicards account online

Hi! I know this is sort of

Hi! I know this is sort of off-topic however I needed
to ask. Does building a well-established website like yours take a large amount of work?
I'm completely new to operating a blog but I do write in my journal on a daily
basis. I'd like to start a blog so I will be able to share my
experience and feelings online. Please let me know if you have any ideas
or tips for brand new aspiring blog owners. Appreciate

my web page manicure

Kaufen ohne rezept

Kaufen ohne rezept deutschland paypal und online apotheke
preisvergleich, online ohne rezept gunstig und auch gunstig kaufen per nachnahme, rezeptfrei kaufen paypal.
Billig kaufen paypal und generika preiswert, kaufen niederlande und auch migrane preis, ohne rezepte.

Here is my blog post pille kosten schweiz