The mathematical design of experiments for cell programming

Developing the mathematical structure of experiments using information theory and combinatorics to speed up the discovery of new cell types.

Programming cells to adopt a particular cell type requires activating a small subset of genes. For most desirable cell types, we don’t know what these subsets are. Searching the space of gene combinations is a difficult experimental problem, because the space of combinations is large and because experiments are subject to noise, which complicates their interpretation.

In this project we reduce the experimental process to its mathematical core, mapping the problem of finding subsets of genes to a communication problem. Each cell in an experiment can be considered as a noisy message, communicating information about the subset of genes for the desired cell type. By combining many of these messages, we can infer the subset that generated them. This problem has its roots in combinatorial design in WWII, but the arena of cell programming inspires generalisations that require new techniques to analyse.

By establishing bounds from the noisy channel coding theorem, we can estimate the number of cells that experimenters must collect in order to find the desired subsets of genes. By maximising the number of bits communicated per cell, biologists can design more informative experiments and find new cell types that would otherwise be infeasible.

The mathematical design of experiments for cell programming