A recent episode of Long Story Short lampoons popular science coverage on the origin of life. The coverage tends to be strong on storytelling but weaker on fact. As if narrating an explorer's journey, complete with pith helmet, Long Story warns, "The first sort of made-up thing you should be on the lookout for is what we call Imagination Story-Time Land." There's also "Half-Truthiland" and "Outright Falsehood Jungle."
A scan of recent science news seems to confirm Long Story on this. Researchers are presented as making confident progress in solving the mystery of how life arose...while being at the same time all over the scientific map.
And on and on. Much or all of this is creative storytelling, half-truths, or outright falsehoods. If they really were making progress, wouldn't that mean they would be converging on the same truth? The reality is that theories of how life came about from non-life, spontaneously and without intelligent guidance, face enormous statistical hurdles. Storytelling is fun. Statistical analysis is less so. But bear with me as I throw some mathematical light on the origin of life.
Scientists tackle origin-of-life problems through two primary approaches: bottom-up and top-down. The bottom-up approach begins at the molecular level, attempting to reconstruct life's emergence from basic chemical elements on the prebiotic Earth. This method scrutinizes prebiotic chemistry, exploring how simple molecules could have self-assembled into more complex structures, eventually forming the basic building blocks of life, then protocells, and developing replication and metabolic capabilities.
The two main scenarios within the bottom-up approach are the "metabolism-first" and "RNA world-first" hypotheses. The metabolism-first scenario proposes that self-sustaining chemical reactions emerged before genetic information, while the RNA world hypothesis suggests that RNA molecules capable of both storing genetic information and catalyzing chemical reactions were the precursors to life. Despite decades of research and numerous variations, including proposals like the iron-sulfur world, lipid world, and protein world hypotheses, scientists have not been able to develop a fully coherent and plausible scenario for the origin of life. These attempts, numbering in the dozens or more, have faced significant challenges in explaining the transition from non-living chemistry to the complex, self-replicating systems that characterize life. Hence the conceptual chaos we observe in popular reporting about life's origin.
This has been well expressed by Eugene V. Koonin of the National Center for Biotechnology Information. As he wrote in his book The Logic of Chance (p. 252):
Despite many interesting results to its credit, when judged by the straightforward criterion of reaching (or even approaching) the ultimate goal, the origin of life field is a failure -- we still do not have even a plausible coherent model, let alone a validated scenario, for the emergence of life on Earth. Certainly, this is due not to a lack of experimental and theoretical effort, but to the extraordinary intrinsic difficulty and complexity of the problem. A succession of exceedingly unlikely steps is essential for the origin of life, from the synthesis and accumulation of nucleotides to the origin of translation; through the multiplication of probabilities, these make the final outcome seem almost like a miracle. [Emphasis added.]
When a distinguished computational biologist like Koonin uses language of that kind ("almost like a miracle"), we should take note. There is a main reason for the unbridgeable barrier. The cell operates as a sophisticated, self-reproducing chemical factory. It requires a minimum set of components that function cooperatively and in a highly integrated manner. This sophisticated collaboration enables the cell to maintain its complex processes and replicate itself. The cell is an integrated system that requires a minimal set of interconnected, interdependent components to operate.
As University of North Carolina biochemist Charles W. Carter Jr. wrote in 2017, emphasizing the level of complexity needed for the cell to function as a living unit:
Studies of gene-replicase-translatase (GRT) systems reveal that gene replication and coded expression are interdependent. Living systems now produce proteins from information encoded in genes using protein translatases whose genes are copied using protein polymerases. Could self-organization of both processes be so strongly coupled that they emerged simultaneously?
The top-down approach, meanwhile, starts with existing life forms and works backward. It uses comparative genomics, phylogenetic analysis, and studies of minimal genomes to infer the properties of early life. This method seeks to identify the Last Universal Common Ancestor (LUCA) and to determine the core set of genes and functions necessary for life. By studying extremophiles and conducting experiments estimating the minimal genome, scientists hope to glimpse the essential components and processes that define primitive life forms. Top-down approaches reveal fundamental requirements for living systems. Integrating these perspectives aims to bridge the gap between non-living chemistry and complex modern organisms.
By determining the minimal set of proteins required for a basic functioning cell, we can perform probability calculations to estimate the likelihood of a minimal proteome -- "the complete set of proteins expressed by an organism" -- arising spontaneously. These calculations provide insights into the statistical improbability of life emerging by chance alone.
As a near-minimal synthetic organism, the J. Craig Venter Institute has introduced the JCVI-syn3.0 bacterial strain. With a proteome of 438 proteins, it serves as the baseline for our calculations.
In proteins, the most critical regions for enzymatic activity are typically:
Less critical regions include:
Considering these factors, for a typical protein of 250 amino acids, we can do a probability calculation. Essential regions (active site, binding site, structural core) amount to around 50 percent, or 125 amino acids. Less critical regions also amount to around 50 percent, or, again, 125 amino acids. Let's do our calculation considering the functional regions of proteins.
Taken as a given, the number of proteins is 438. The average protein length is 250 amino acids. The critical region is 50 percent of each protein, or 125 amino acids.
Here we go with the probability calculation for one protein. For the essential regions, (1/20)^125 ≈ 1.5 × 10^163. For the less critical regions (1 in 5 specific amino acids), (1/5)^125 ≈ 2.8 × 10^88 Combining these calculations, we get: 1.5 × 10^-163 × 2.8 × 10^-88 ≈ 4.2 × 10^251. This estimate is consistent with other estimates of the rarity of proteins when scaled by length. For all 438 proteins, then, the result is (4.2 × 10^-251)^438 ≈ 10^109,938.
Let's compare that probability with the probability of winning the Powerball lottery multiple times in a row. We first need to calculate the odds of winning at Powerball just once. The odds of that are approximately 1 in 292,201,338. For easier calculation, let's express the odds as 3.42 × 10^9. Now, we need to find how many consecutive Powerball wins would equal the probability we calculated for the proteins. The number is 10^-109,938 which equals (3.42 × 10^-9)^x. We'll take the log of both sides and obtain the result: -109,938 which equals x × log(3.42 × 10^-9). In turn, -109,938 equals x × (-8.46). Solving for x, we get the result: 109,938 divided by 8.46 or about 12,996.
This means the probability of all 438 proteins forming spontaneously is roughly equivalent to winning the Powerball jackpot 12,996 times in a row. This estimate does not include the fact that multiple copies of proteins would likely be required. It also leaves out the DNA required to manage the production and maintenance of proteins, and the need to interconnect proteins properly. For the interconnection challenge, see physicist Brian Miller's article at Evolution News.
This astronomically large number illustrates the extreme improbability of such a complex system arising by chance alone. It highlights the challenges in explaining the origin of life through purely random processes.
We can express the argument this way: A minimal functional cell requires a specific set of integrated proteins. The probability of this specific set of proteins forming spontaneously is astronomically low (equivalent to winning the Powerball lottery 12,996 times in a row). Therefore, the spontaneous formation of a minimal functional cell through random processes is virtually impossible.
The astronomical improbability of the spontaneous formation of even a minimal set of functional proteins necessary for life presents a significant problem for purely naturalistic explanations for the origin of life. We can see, then, why researchers in the field are, as Rice University chemist James Tour has put it, "clueless" about how life's origin came about.
When we are faced with such daunting improbability, it is reasonable to consider alternative explanations. Most scientists seeking a resolution of the puzzle don't want to go there, whether for reasons of philosophical outlook, peer pressure, or personal preference. However, when examining highly specified and complex systems that appear to be fine-tuned for function, especially when the probability of their chance occurrence is vanishingly small, the inference to design becomes a logical possibility, at the very least.
The argument for design is strengthened by the observation that living systems exhibit characteristics often associated with designed objects -- such as information content, goal-directed processes, and interdependent parts functioning as a whole. The minimal cell, with its precisely coordinated set of proteins and genetic instructions, bears hallmarks of purposeful arrangement rather than random assembly. For an intrepid scientific explorer, like the one from Long Story, the conclusion of intelligent design beckons.