















Deep learning simplicity
Statistical physics
We give a theory for the output of deep-layered machines and show that, as the network depth increases, it is biased towards simple outputs.
Deep-layered machines have a built-in Occam’s razor
Draft (2026)
Input-output maps are prevalent throughout science and technology. They are empirically observed to be biased towards simple outputs, but we don’t understand why. To address this puzzle, we study the archetypal input-output map: a deep-layered machine in which every node is a Boolean function of all the nodes below it. We give a mathematical theory for the distribution of outputs, and we confirm our predictions through extensive computer experiments. As the network depth increases, the distribution becomes exponentially biased towards simple outputs. This suggests that deep-layered machines and other learning methodologies may be inherently biased towards simplicity in the models that they generate.
Draft (2026)