Postdoc in machine learning and inference
21 JUN 2020
The London Institute for Mathematical Sciences is hiring a Postdoctoral Research Fellow in machine learning and inference. This follows a recent partnership with cell coding company Bit Bio on a moonshot mission to program every human cell type for use in biomedical research.
Are you interested in this position?
The positions start as soon as possible, with a gross salary of £42,000 per year.
The London Institute is assembling a team of theorists to decode the dynamics of cell programming. The postdoc will interact with a senior theorist being simultaneously recruited and Thomas Fink, as well as Bit Bio founder Mark Kotter. The postdoc will help determine theoretical lines of attack for interpreting cell programming data to unravel the operating system of life.
Candidates should have a PhD in physics, theoretical computer science or mathematics. They should have experience in some of the following: statistical physics, the theory of neural networks, Bayesian inference, disordered systems, high-dimensional inference and causal inference. Familiarity with cellular processes is advantageous but not necessary. They will have a promising track record of research. Candidates should write well and value collegiality and intellectual adventure.
The London Institute for Mathematical Sciences is a private academic institute for curiosity-driven research in physics, mathematics and the theoretical sciences. Funded by research agencies, foundations and firms, it gives scientists the freedom and support to make fundamental discoveries full-time.
Plausible approaches include, but are not limited to, the following:
Iterative data analysis and machine learning
In the collaboration between LIMS and Bit Bio, theory and experiment will be tightly coupled. The types and amount of data generated will be iteratively shaped by emerging theoretical insights. Neural networks provide a coarse tool for early insight into Bit Bio’s transcription factor perturbation experiments. Their success will depend on tailoring the learning algorithm to the details of the experiments. Faced with partial information about which genes communicate with which others, network inference and community detection can suggest candidate genes for more focused experiments. As we gain insight into the structure of cell programming, these heuristic approaches will set the stage for more mathematically tractable lines of attack.
High-dimensional and causal inference
Classical statistics was developed for when the number of variables being estimated is fixed. But in genetic regulatory systems, the number of variables being estimated tends to grow with the amount of data. Inference in this high-dimensional regime can confuse real patterns with noise—a breakdown known as overfitting. The field of statistical physics can be used fruitfully here, by predicting the degree of uncertainty in the dangerous overfitting regime. Crucially, this provides a means to discriminate between the reliable and spurious inference. To reconstruct directionality, techniques from the emerging field of causal inference will also play a role.
Cell regulatory networks and neural networks
Bit Bio has successfully rewired cells by switching on a handful of carefully selected transcription factors. If one writes down the equations governing the entire system of transcription factors, they turn out to be curiously similar to the equations describing interactions between neurons in the brain. In this analogy, stable neuron firing patterns are the equivalent of stable cell types. And we have in recent decades learned a lot about how to “reprogram” brain models by the clever modification of local variables. Adapting this mathematical insight to transcription, and combining it with Bit Bio’s experimental innovations, will offer a strategic head start in the quest to reprogram cells.
Cell subroutines and combinatorial innovation
There is mounting evidence that cells possess interoperable subroutines that can be combined to perform a variety of tasks. Combining modular subroutines in different ways is a powerful shortcut for realizing new functionality fast. For an analogy, think of a library of actual software modules. These are not self-contained pieces of code; rather, each module calls on other modules to perform its task. So new subroutines both call on, and can be called upon by, other subroutines. Dynamics in these expanding spaces are path-dependent and defy traditional notions of equilibrium. Tractable models of combinatorial innovation will help unravel the architecture of genetic regulatory networks in the cell and suggest mechanisms for disrupting pathological behaviours.