So, I tried very simple experiment, where parameters of convolution are initialized randomly and not trained at all. The results are better than what I would expect on MNIST dataset.
0 at the end for convolution means 0 step size - do not train. Only parameters of other layers are trained. Trained with WideOpenThoughts.
Results (above 99% accuracy for MNIST):