### Let's go wider #2

Finally I found paper, which compares somewhat really deep and wider networks. As expected a bit, let's go wider...

From the paper (notice better results with wider networks):

Wide network has better results

Wide network has better results
Wide :)

Publication:

Guohao Li, Matthias Muller, Bernard Ghanem, Vladlen Koltun: Training Graph Neural Networks with 1000 Layers

And probably why:

P. Taraba: Linear Regression on a Set of Selected Templates from a Pool of Randomly Generated Templates, Machine Learning with Applications, Elsevier, August 2021

Google Research is going wider as well, even though they mention it more quietly:

Best results with CoAtNet-7

CoAtNet-7 is widest...
("D denotes the hidden dimension (#channels)" from paper)

CoAtNet: Marrying Convolution and Attention for All Data Sizes

'CoAtNet: Marrying Convolution and Attention for All Data Sizes' on arxiv

Also:

ImageNet benchmark