First principles, ab initio computational modeling of biological processes is one of those grand visions, like artificial general intelligence, that both inspires the imagination and conjures images of mad scientists exclaiming, “It’s alive!” At present, we are a long way from either, but real progress is being made. At times generative adversarial networks and systems biology may seem like magic, but they are not, Arthur C. Clarke’s observation (“Any sufficiently advanced technology is indistinguishable from magic.”) notwithstanding. There is no magic, simply a plethora of scientific and technological advances by global teams of researchers.
The dream of predictive biological modeling is an old one, with many proponents, dating to some of the earliest days of digital computing. Since then, progress has been slow, but steady, dependent on both greater understanding of biological processes and increased computational power. To be clear, ab initio biological modeling is distinct from artificial life, which examines systems and emergent behavior similar to, though different from biology, though there are important connections.
Why do we want to build such models? The first and foremost reason is to validate our understanding of biological processes, comparing computational predictions with experimental results. As Richard Feynman once remarked, “What I cannot create, I do not understand.” If we cannot build models that accurately duplicate experiment, we do not fully understand the biology. The second, of course, is to allow us to understand how changes to genes and gene expression can affect biological processes, with profound implications for the environment, and for health care and treatments. A corollary of this is the ability to predict the effects of gene editing and other modifications, all within critical ethical boundaries.
Twenty years ago, when I was director of NCSA, the NSF Director, Rita Colwell, and I used to discuss the possibility of predictive whole cell modeling, the limitations in our biological knowledge and the computing infrastructure that might be required to make such a thing possible. A brilliant biologist, she knew for more about the biological aspects than I ever have, though I’d like to believe I brought a bit of computing game to the discussion. Mind you, at that time, the Human Genome Project was still underway, and we were really excited to have just deployed terascale Linux clusters and the NSF TeraGrid as national resources. Since then, we have come a long way in both biology and computing.
A Minimal Genome
Experimentalists have long targeted certain, simple lifeforms (e.g., Caenorhabditis elegans, a transparent nematode; Arabidopsis thaliana, a small flowering plant with a short lifecycle; Drosophila melanogaster, the fruit fly; and Saccharomyces cerevisiae, a yeast) as model organisms. The simplicity or specific morphology of those organisms simplify study. Similarly, those building computational biological models have long chosen to start with the simplest living cells.
This raises the obvious question: what is the minimal genome required for life (i.e., the cell cycle of birth and death), and concomitantly, can we either find or construct such a minimal genome. Minimal is a subtle thing, as the set of genes needed for a cell to grow and reproduce in a perfect medium is different from the larger and interrelated set that allows it to thrive in adverse or changing conductions. Finally, the interrelation is key; the sum (the gene network) is more than the parts (the genes themselves).
In 2010, researchers at the J. Craig Venter Institute (JCVI) first created a chemically synthesized genome (JCVI-syn1.0), then in 2016 succeeded in creating a synthetic bacterial cell with a minimal number of genes, but capable of reproducing in a controlled environment. JCVI-syn3A consists of fewer than 500 genes on a single circular chromosome. (See the Science articles in 2010 and 2016 for the details.) Not surprisingly, JCVI-syn3A has since been the target of whole cell modeling, of increasing sophistication.
I wrote about the excitement surrounding the 2010 synthetic genome work in the Communications of the ACM, in a piece entitled In Vivo, In Vitro, In Silico. (I also posted on this blog.) As I noted then, the work was a fascinating example of the interplay of biology and computing, but it also highlighted that we did not fully understand the function of all the genetic material in even this simple genome. Indeed, when the 2016 article was published, the precise biological functions of roughly 30 percent of the JCVI-syn3.0 genes were unknown; the function of more, though not all, is now understood.
In Silico Modeling
In 2019, a University of Illinois team (Luthey-Schulten and colleagues) constructed a computational genome scale metabolic model (GSMM), a flux balance model, of the metabolic network (the metabolism) of JCVI-syn3A, showing how DNA specifications drive molecular processes. That work, published in eLife, also raised questions about the minimal genome, offered some suggestions for further gene removal, and raised questions about some essential genes whose function was unclear.
I was beyond thrilled to read the latest whole cell modeling results from this team, which has continued expanding in silico models of JCVI-syn3A. Remember, a detailed whole cell model (WCM) includes the cell’s size, shape, components, reactions within the cell, and environmental interactions, all along the cell lifecycle. Published in Cell earlier this year, these simulations of a group of roughly 200 cells are a fully dynamical kinetic model capable of demonstrating emergent behavior. (The University of Illinois press release can be read here.)
As the authors note, metabolism, genetic information processes, and growth can, at present, only be modeled via hybrid stochastic and deterministic simulations. Specifically, they model the metabolic network via deterministic ordinary differential equations (ODEs) and the kinetics of genetic processes via stochastic models on a cubic lattice of the cell interior. Given their computational requirements, the latter are dependent on GPUs for acceleration. The additional details on the mathematics of the models are found in the paper, and the code is available on github (here and here), a substantial fraction of which was written in Python.
Looking Forward
Multiple groups are continuing to build ever more detailed cellular models, leveraging both growing biological understanding and greater computing capability. Modeling the cellular behavior of a cell with a minimal genome is another step into an exciting future.
Comments
You can follow this conversation by subscribing to the comment feed for this post.