Decoding cell programming
27 JUN 2020
Bit.Bio has partnered with the London Institute for Mathematical Sciences on a moonshot mission to create every human cell type for use in biomedical research.
For the most ambitious projects, you want to assemble a crack team, with exceptional complementary skills. And it would be hard to conceive of a more revolutionary project than the one being tackled by the British biomedical startup, Bit.Bio.
The biologists at the cell coding company, which was named Cambridge University Startup of the Year in 2018, are on a moonshot mission that could open the door to a new generation of medicines. Their extraordinary aim is to learn the genetic programming codes for creating large-scale, high-purity batches of human cells for use in biomedical research. And it’s not just a few cell types they’re after. It’s every cell type in the human body.
To expedite their mission, Bit.Bio is now joining forces with the London Institute for Mathematical Sciences. Theorists from the academic research centre will build on the biologists’ expertise and experience, contributing insights from machine learning and mathematical modelling.
Bit.Bio has already logged some astonishing successes. The company was founded in 2018 by Cambridge neurosurgeon Dr Mark Kotter, but its story began years earlier, when Dr Kotter became frustrated by the inefficiencies of testing drug therapies on animal cells. Convinced that the answer lay in finding scalable ways to create human cells, he started experimenting with cutting-edge techniques for reprogramming stem cells: proto-cells capable of turning into any other cell type.
By switching on carefully selected genes known as “transcription factors”, which govern cellular development, Kotter and his colleagues rewired the cells, directing them to adopt new identities. In 2012, they successfully used iPS cells—induced pluripotent stem cells, a synthetic, ethically uncontroversial kind of stem cell—to create a batch of a type of brain cell called oligodendrocytes. But there was still one problem: only a tiny fraction of the stem cells underwent the required transformation.
The biologists deduced that the reason for this was a protective mechanism in the iPS cells known as “gene silencing”. This meant that, when they activated the transcription factors, most of the cells responded by deactivating them. Undeterred, they got to work finding a way to circumvent gene silencing. Their quest led to the development of Opti-Ox, a patented technique that inserts the genetic instructions into more receptive parts of the genome known as genetic safe harbours. It was a game-changer, and Bit.Bio was born.
Since then, the company has successfully created large-scale, high-purity batches of neurons, muscle cells and oligodendrocytes—an achievement that would have been unthinkable only a few years ago. Yet the crucial point is that Bit.Bio’s breakthrough technique could in principle be used to create any human cell type. Nevertheless, identifying the right transcription factors is a monumental task.
For an analogy, imagine you have just discovered the fuse box in a new home. There are a dozen switches but no labels telling you what they do. To reverse-engineer the instruction manual, you first turn on all the lights and appliances in the house. Then you throw the first switch in the fuse box and patrol the house to note the results. Next, you do the same with the second switch. Then the third, and so on. And how about a combination of two or three switches at once? Now imagine the fuse box doesn’t have 12 switches. It has thousands.
That gives an idea of the magnitude of the challenge. Every human cell, whether it be skin, muscle or neuron, contains the same 20,000 genes. Of these, more than 2,000 are potential transcription factors. On average, each cell type is encoded by a combination of just a handful of factors. But given the time and money it takes to test just one factor combination, a brute force approach is impossible—there are far too many combinations to test them all. Instead, the trick is to test each factor individually and a small fraction of the most likely combinations, and then use sophisticated tools from mathematics to weaponise the information they yield. Enter the London Institute.
The London Institute
The London Institute for Mathematical Sciences has enjoyed an impressive success story of its own. It is a private academic research centre—a rare beast in Britain, which has a comparatively thin ecosystem of research organisations, with universities holding a near-monopoly on research. Texan-born physicist Dr Thomas Fink founded the Institute in 2011 with the aim of filling this gap. It provides gifted scientists with the freedom and support to devote themselves full-time to what they do best: making fundamental discoveries. In 2018, the Institute was designated an Independent Research Organisation, making it eligible to compete with universities for Research Council funding. It is Britain’s first private research centre in the physical sciences to achieve that status.
As well as more established topics in physics and mathematics, the London Institute has a track record of research in mathematical biology. Soon after its inception, it was selected by DARPA to join an ambitious project to uncover fundamental laws in biology. Since then, its scientists have carried out research into such related areas as the geometry of genome space, high-dimensional biological inference and exact models of genetic regulatory networks.
A revolution in biology
Biology and mathematics have not, historically, proved natural bed-fellows. Unlike physics, biology has resisted a rounded description in terms of a few fundamental equations. This is partly because it has an added layer of complexity—namely, that processes at different organizational length scales can influence one another. In physics, by contrast, what happens at one length scale is largely independent of what happens at others.
Yet there is compelling evidence that biology’s outward complexity may in fact mask elegant mathematical structure. For example, the human genome—the instruction manual for all the intricacies of life—amounts to only 3 gigabytes. That is the data equivalent of a two-hour movie. Evolution acts as a powerful suppressor of unneeded complexity, so diversity within an organism is intrinsically nonrandom. In other words, nearly every part plays a role; there is very little fat. Concision and efficiency are hallmarks of pattern, and mathematics is the language for expressing it.
Brokering the marriage of mathematics and biology will require delicate intellectual diplomacy, not to mention new kinds of mathematics. Yet the process has already begun, and Dr Kotter believes it will be transformative in ways that cannot now be predicted. “With the introduction of calculus, physics became a predictive science, rather than a merely descriptive one,” he says. “For three centuries, biology lagged behind. But new mathematical tools mean that biology is catching up.”
The final frontier
In uncharted territory, it’s hard to pick the best route in advance. But the London Institute has already settled on the broad strokes of its campaign. It will focus on three approaches in particular: bottom-up modelling, top-down inference and a mathematical analogy. Yet as Dr Fink points out, “The mathematical concepts behind all three will require an upgrade.
Most physics modelling assumes that a system has settled down to a state of equilibrium in which opposing influences are balanced. But cells are transitioning through a series of states, so understanding their development is a fundamentally non-equilibrium problem. As a result, many of the available mathematical tools will not be up to the task. The old analogy pioneered by Conrad Waddington, of marbles rolling down a landscape, no longer applies. Instead, we need to think in terms of systems that can feed back into their own environments. In other words, the shape of the landscape can depend on which part you are exploring. In these Escherian landscapes, motion can involve cycles around stable states and even more exotic variations.
Inference, too, will need a shake-up. The field of classical statistics was developed for when the number of variables being estimated is fixed. The bigger the data set, the more accurate the estimates. But in systems biology research, the number of variables being estimated tends to grow with the amount of data. Inference in this high-dimensional regime can confuse real patterns with noise, and miss the wood for the trees—a breakdown known as overfitting. The field of statistical physics can be used fruitfully here, by predicting the errors in the dangerous overfitting regime. Crucially, this gives scientists a way to discriminate between reliable and unreliable inference. For Bit Bio, this means separating fact from fiction when planning its next experimental steps.
One of the master strokes of mathematics is casting the structure of an unsolved problem into the structure of a different problem that is already understood. This kind of thinking outside the box can accelerate progress in a new field, and may help here. If one writes down the equations governing the entire system of transcription factors, they turn out to be curiously similar to the equations describing interactions between neurons in the brain. In this analogy, stable neuron firing patterns are the equivalent of stable cell types. And we have in recent decades learned a lot about how to “reprogram” brain models by clever modification of specific variables. Adapting this mathematical insight to transcription, and combining it with experimental innovations at Bit.Bio, could offer a strategic head start in the quest to reprogram cells.
Dr Fink believes the union of mathematics and biology could be as transformative for mathematics as it is for biology. “Life is the final frontier of mathematics,” he says. “In contributing to Bit Bio’s mission, we will simultaneously push the boundaries of mathematical thought.” For Dr Kotter, creating human cells at scale could be just the beginning: “If we succeed in our aims, it will change science forever. Understanding the operating system behind cell identity will enable medical researchers to tackle the most pressing diseases, including cancer and dementia.”
Illustrations created by Nicholas Rougeux (www.c82.net). Each illustration represents an experiment in visualizing notes from music scores.