Inference in many dimensions
Developing a theory of high-dimensional statistical inference using analytic tools from the statistical physics of disordered systems.
In modern statistical inference the number of inferred parameters often grows with the size of the data set. Inference in this high-dimensional regime is a serious challenge for classical statistics, which was developed for when the number of parameters is fixed. As a result, high-dimensional inference is often employed without understanding how it works. In the words of da Vinci, ‘he who loves practice without theory is like a sailor who boards a ship without rudder or compass and knows not where he is cast’.
In this project we use analytic tools from the statistical physics of disordered systems to develop a theory of high-dimensional statistical inference. The parameters of the statistical model play the role of the degrees of freedom and the likelihood of their values plays the role of energy. The inferred parameters correspond to the lowest energy state. Among other techniques, we harness the replica trick to average over the disorder embodied by the data.
A current problem with high-dimensional inference is the propensity to generate unsubstantiated and often unfalsifiable conclusions. A foundational theory will help discriminate between reliable and unreliable inference, as well as inform the design of efficient inference algorithms.
We optimize Bayesian data clustering by mapping the problem to the statistical physics of a gas and calculating the lowest entropy state.