Skip to main content

Deep vs Wide

Currently, in machine learning, deep learning publications dominate over wide networks, but when not looking at current trends and concentrating only on precision (accuracy), the differences might not be as big as some might think.

DatasetDeep NetworksWide Networks
TIMIT & CIFAR-10Acoustic modeling using deep belief networksDo Deep Nets Really Need to be Deep?
NORB & CIFAR-10Learning methods for generic object recognition with invarianceto pose and lightingAn analysis of single-layer networks in unsupervised feature learning
MNIST & ADSStochastic pooling for regularization of deep convolutional neural networksLinear Regression on a Set of Selected Templates from a Pool of Randomly Generated Templates [under review] & slides for On Linear Regression

Motivation: Let's have 9 features of same probability of correctness p, which is in (0.5, 1.0]. What is the output probability when we combine 9 features together (majority of features is correct) - blue line and when we do it in deep manner (3 features create 1 new feature and then we combine 3 new features together) - red line?

Python script for above graph. Blue line is always above red line.

Now what if we go to infinity?

As it turns out, if we can generate infinite amount of features with probability of correctness from (0.5, 1.0], all we have to do is count which features are for and which are against.

Instead of technical proof, here is a graph how sum of first k+1 terms in expansion of (p+(1-p))^(2k+1) evolves with increasing k:

And here is a python script for generating this graph.