"Most biologists have a guilty secret: they started as bird-watchers."
Steve Jones: Almost Like a Whale
How did life arise? This is not surprisingly a very frequently asked question (and has come up more than once on uk.r.b). Indeed, the origin of life is one of the great questions that has occupied mankind since history began and probably long before. The process of evolution - how modern living things evolved from a few original life forms - is now fairly well understood, certainly as far as the last few hundred million years is concerned. The generation of the first forms of life from non-living matter - known to scientists as abiogenesis - is much less well understood.
But, contrary to what some people think, the subject is not so obscure that we can learn nothing about it. Scientists are making progress on it, though there is still a long way to go before we have a complete and convincing description.
I'm writing this essay because I couldn't find a good introduction on the Web for people without a technical background. Articles I found were either very superficial, or covered one small area, or else they tossed around terms like "nucleotide" and "chemolithoautotroph" on the assumption that the reader would be familiar with them. So I thought I'd have a go at writing about the subject for people with little knowledge of chemistry. (Given that the subject is all about chemistry, this is about as easy as nailing protoplasm to the ceiling, but I'm game if you are.)
Caveat: this is a very complicated area, with many lines of research. This description is enormously simplified, and some of the things here may well be simply wrong. Hopefully this essay will however at least give the interested lay reader enough of a start to go looking for more detailed information.
Except for the "sources" section, all the links are to a glossary on this site. So, if you know what the word means, you can skip the link - unless you want to check whether I got it right. ;-) The glossary terms pop up in another window (which works much better with Javascript enabled). This is however an environmentally friendly page - the window is re-used each time. And if you want to read it off-line, just download this file and the glossary to the same directory.
On the Web there are a depressingly large number of sites with titles like "Why abiogenesis is impossible". I intend to give them the amount of attention they deserve - i.e. none whatsoever - except for one point. It is practically certain that living things once did not exist - and on this point the generally christian-fundamentalist authors of such sites do not differ from informed scientific opinion. Since we are indubitably here now, abiogenesis has clearly happened, and proclaiming its impossibility is rather pointless. The point at issue is how. (And should anyone have any evidence that divine intervention was part of the process, they are welcome to produce it.)
One idea that has received quite a lot of publicity is the idea that life originated elsewhere in the universe and arrived on earth in meteorites or comets, or was even deliberately delivered here by intelligent extraterrestrials.
This can be dealt with quite briefly: it is possible, but there is no real evidence to support it. And it solves nothing, because one is then left with the question of how life originated wherever it did originate. (Hoyle and Wickramasinghe have claimed to have some evidence for life in space, and even for the continuing arrival of viruses from outer space, but their evidence is, to put it politely, tenuous in the extreme. See for example Shapiro's book Origins, chapter 9.)
This idea should therefore be set aside unless and until a compelling reason arises for taking it more seriously. Note that it has indeed been demonstrated that organic chemicals have arrived on the earth in meteorites - but that is a quite separate matter.
And now back to the real subject.
Let point A be the set of chemicals which are known, or strongly presumed, to have been initially present on the young earth. Point B is the first living organism. We have to get from A to B.
Although people once thought (100+ years ago) that this might happen in a single step if you waited long enough, it is now clear that it is a process involving many steps.
I will sketch here what is known, or suspected, about the steps involved. The obvious route for the description to take is to start at A and go to B. However I'm going to start at B. This is not because I'm being perverse (feel free to disagree if you wish) but because it's easier to visualise where you're going when you know the end point. Then I'll go back to A and look at possible routes to get to B.
As most people know, the key to the chemistry of life is the element carbon. This is the only element whose atoms are capable of forming long chains, and so the only element which can form the complex molecules which are needed for life.
Besides carbon, the major elements in living organisms are hydrogen, oxygen and nitrogen; around twenty other elements are also essential for life as we know it.
Molecules are described by chemical formulae. (It's useful to know something about them, because it helps to visualise what is happening.)
The chemicals in living matter - organic chemicals - follow the same basic chemical laws as inorganic chemicals. Indeed compounds containing a single carbon atom, such as carbon dioxide, can be regarded as either organic or inorganic. But whereas inorganic molecules typically contain less than ten atoms, organic molecules can contain many thousands. Because organic molecules can be so much larger than inorganic molecules, their structure and behaviour can be correspondingly complex.
Modern animals and plants are fantastically complicated. Bacteria are however much simpler - though still very complex - consisting of a single prokaryotic cell. (The type of cells found in animals and plants - eukaryotic cells - are widely thought to have originated as a group of different bacteria which merged together).
There seems little doubt that the first living organism would have been something like a small bacterium (and not like an amoeba as some people seem to think - amoebas are eukaryotes). It could however have been considerably simpler. Modern bacteria have adaptations which enable them to do such things as compete with other bacteria, cope with microbes trying to eat them, invade animals and do battle with their immune systems etc. The first cell would obviously need none of this.
The question is: how simple could it be? There are many components in even a prokaryotic cell, but the four key components, which it seems impossible to do without, are:
The way in which the genetic information is used is as follows. The genetic information is stored in the DNA in the form of a genetic code. When needed, the information is transcribed into RNA (a bit like the process of transcribing Greek place names into the western European alphabet). This then gathers together amino acids which are floating around in the cell interior, and builds them into the proteins which do the actual work. This might for example be feeding, enlarging the cell, repelling intruders, or reproducing by dividing into two new cells.
The basic flow of information is thus: DNA -> RNA -> Protein
When a cell reproduces, the two chains of the DNA are unwound, and on each single chain a new complementary chain is built, thus producing two double helices instead of one. Each daughter cell receives one of the new DNA molecules.
Now, the major problem in abiogenesis is producing these four components and bringing them together in the right way. It seems highly improbable that all would have arisen simultaneously from inorganic matter - something must have come first.
Although the lipid cell walls and energy-system are not simple, this problem is generally considered to revolve around the issue of which came first: the DNA or the protein.
It would appear that protein is not sufficient because it carries no genetic information. On the other hand DNA is not sufficient because it cannot express that information - it cannot catalyse reactions, it cannot replicate itself, it cannot really do anything on its own. (For a computer analogy, DNA forms the memory, while the proteins form the processor - either is useless on its own.) But both proteins and DNA are complex. It is difficult enough to see how either one formed in the prebiotic environment, let alone both.
A major breakthrough was made when it was discovered that, although DNA has no catalytic abilities, RNA could catalyse certain reactions. This lead people to propose that RNA was in fact the first self-replicating molecule, and that proteins came later.
This is not by any means certain. Getting RNA to form in the absence of proteins is still far from straightforward, and some people still think that proteins came before nucleic acid.
But what does seem highly likely is that living organisms had a phase where they used RNA to hold the genetic information, and that DNA came later. (One of the hints of this is that some viruses, called retro-viruses, still use RNA as their genetic material - the HIV virus which causes AIDS, for example).
Having established this, lets go back to the beginning and start going forwards in time.
What was the early earth atmosphere like? Models of solar system formation suggest that little would have been left over from the gas of the nebula from which the planets formed - it would have been blown away by the solar wind. This means that most of the atmosphere would have come from inside the earth - via volcanos or mid-ocean vents - with a few traces of substances brought in by comets and meteorites.
The main components would probably have been methane, ammonia, carbon dioxide, hydrogen, nitrogen and water vapour. There is good reason to suppose that many other simple compounds would also readily be formed, such as formaldehyde, cyanide (HCN), cyanogen (C2N2) and cyanoacetylene (HC3N). Most if not all of these substances have actually been detected in inter-stellar space.
There was virtually no free oxygen. We know this for certain because up until about 2.5 aeons ago minerals formed which cannot form in the presence of oxygen. (Minerals which require oxygen to form, such as ferric oxide, turn up from about 2.3 aeons ago onwards).
In a famous experiment in 1952, Stanley Miller demonstrated that amino acids could be formed in this sort of atmosphere by passing electrical sparks (lightning) through it.
It is now thought that the primitive atmosphere was not as strongly reducing as Miller assumed. However repeated experiments with other atmospheric combinations have also produced amino acids - in fact the majority of the 20 amino acids found in modern organisms have been formed in plausible prebiotic environments. Amino acids have also been found in the famous Murchison meteorite, which fell in Australia in 1969. It is practically certain that amino acids existed on the primitive earth.
Similar experiments have also succeeded in producing the bases needed for nucleic acids, from reactions between ammonia and the above-mentioned cyanide, cyanogen and cyanoacetylene.
The next important step towards life would have been the formation of larger molecules, called polymers. In principle this occurs by simply linking monomers together. This isn't necessarily easy however. For example the polymerisation of sugars to form starch requires a water molecule to be expelled for each monomer added. If the sugars are themselves in water (sea or pond) then this is difficult: trying to get rid of a water molecule to an environment full of water is a bit like trying to sell sand to Saudi Arabia. Living things use enzymes to achieve this. (No, not selling sand ...)
These processes can however be expedited by catalysts of various kinds. For example certain types of clay act as a catalyst: using them, polypeptides of over 50 amino acids have been created in the laboratory. Pyrite is another possible catalyst.
Another example of a process expedited by inorganic catalysts, discovered by Günther Wächtershäuser is the production of pyruvate, a three-carbon organic compound, catalysed by the mineral iron sulphide. This is a potential building-block for numerous substances, and has another important use, discussed below.
A key step in the process is the formation of self-replicating molecules. This is where we hit the chicken versus egg question: were the first self-replicators nucleic acids or proteins?
Well actually it is possible that the first replicator was something simpler than either. Various self-replicating molecules are now known, though we have little idea yet which if any of them was actually the first to appear. However, here I will just sketch the ideas for protein-first and nucleic-acid-first.
Note that replication alone is not enough. If the replication is too faithful, then only identical copies are produced, and no progress is made. If the replication is not at all faithful - e.g. the products have wildly varying chain lengths - then chaos ensues and again no progress is made. At some stage a replicator must have been formed which was nearly but not quite reliable.
If the process followed this route, then it would have been necessary to first fabricate nucleosides.
Plausible processes are known for the production of both the sugar and base components, with the sugar (ribose) being formed from formaldehyde and the base from cyanide. The problem is that these two processes require different conditions. They must presumably have taken place in separate locations, and the products then brought together - it is not yet clear how that would happen.
Again, when the components are brought together, possible prebiotic processes for turning them into nucleosides are known, but they produce very low yields. This is one of the reasons for some scientists to prefer the protein-first route.
The next step would be the formation of nucleotides, by adding phosphate to the nucleosides. Various possible mechanisms are known for this.
Then we have to form oligonucleotides - short pieces of nucleic acid. This also is a tricky step, but ways of achieving it have been proposed (see the talk.origins FAQ on this in the references below). If we can get that far, then extending the nucleic acid molecule to greater lengths is reasonably straightforward.
If RNA could be formed then the possibilities are great. A primitive form of evolution in RNA has been demonstrated - when replicating repeatedly in the presence of an enzyme that normally breaks down RNA, it eventually became resistant to it.
This route has been championed by people like Robert Shapiro and Sidney Fox.
The basic idea here is somewhat similar to the above. However instead of starting with nucleotides, one starts with about four to six of the simplest amino acids, which polymerise to form polypeptides and then proteins. This part is more straightforward than the nucleic-acid equivalent.
The tricky part is finding a way to get the proteins to replicate themselves effectively. A small start on this has been made - see the article by Wills - but it doesn't seem to have got as far as research on catalysis by RNA.
While the hottest debates in abiogenesis have been around the protein versus nucleic acid issue, there have also been competing ideas for the origin of the cell membrane - the container which holds a cell together. Ideas for these include - wait for it - liposomes, coacervates, and proteinoid micro-spheres.
You haven't gone away yet? OK - let's explain. Coacervates are small, roughly spherical, objects formed of large organic molecules, which form in water. They can operate rather like a cell membrane, letting only certain molecules in and out.
Proteinoid micro-spheres are somewhat similar, formed from polypeptides in very hot water which is then cooled. They are called "proteinoid" because the substances involved have a similar structure to proteins - based on chains of amino acids - but do not match any proteins actually met in living organisms. They are more stable than coacervates. In appropriate conditions they can grow and subdivide. They are the cell container favoured by the protein-first scientists.
Liposomes are tiny double-walled bubbles spontaneously formed by lipids in water. The attractiveness of these as the initial cell membrane is obvious - the membrane of bacteria is also basically a double-walled lipid layer. The bacterial membrane however includes many proteins which allow specific chemicals to enter or leave. Without these the membrane is almost impermeable. However bacterial membranes have a carbon chain of 16 to 18 atoms. It has been found that shorter chains - 10 to 14 atoms - are slightly permeable and could potentially function as an early cell membrane. (See the article by Zimmer).
All of these possible membranes could have formed in prebiotic conditions. So whereas it was once thought that the cell membrane was something formed once the nucleic-acid/protein system had got going, it is possible that the first membrane actually formed very early and provided the container in which the nucleic-acid/protein system was able to develop.
OK, so some way or other, we have got a self-replicating molecule. It probably doesn't reproduce itself terribly well, and only in restricted conditions. Where next?
It could well be a hypercycle. This is a cycle of different molecules whereby the presence of A enhances (catalyses) the production of B, of B enhances the production of C, C enhances D and D enhances A (for 4 components - there can be more). Without going into details (because I don't understand them ;-) it appears that if such a hypercycle gets going within a protective membrane, that it favours more accurate reproduction of the component molecules and also division into two smaller units when the concentration of the molecules reaches a certain level.
A tantalising and poetically elegant possibility of the hypercycle is that the answer to the question: "which came first - the chicken or the egg?" will, just as with Gallus gallus, turn out to be: neither. It is conceivable that nucleic acids and proteins developed together in a hypercycle which started with simpler molecules.
We have now arrived at the stage of the protobiont (or prebiont or progenote - the terms don't seem to have been standardised yet). A protobiont is an aggregate of organic material that manages to maintain an internal environment different from its surroundings: a cell membrane of some sort is therefore an essential part.
The protobiont initially consists of a hypercycle within a membrane of some sort. This becomes (if it is not already) a nucleic acid / protein hypercycle. Over time, the molecules in the hypercycle become more complex and sophisticated and better at reproducing themselves.
The key part is the development of the genetic code - the strict mapping from patterns of nucleotides to patterns of amino acids in proteins. This was once thought to be a completely arbitrary affair. However recent work suggests that at least some amino acids do in fact preferentially bind to the codon triplets which code for them. This is quite exciting because it immediately suggests a mechanism for evolution for the code: first crudely, with the amino acids binding directly to the RNA (but often not binding in the right place!), and then later with more sophistication, using enzymes culminating in the current synthetases (which nearly always get it right). The code may have started with a dozen or so amino acids and only later expanded to the current 20.
Once the genetic code was in place, modified forms of RNA had a way of (fairly) reliably passing on advantages changes to their offspring. Any RNA molecule that happened to acquire an advantageous mutation (thanks to a copying error) could not only produce more "offspring", but also ensure that those offspring in turn were more successful.
The other crucial component of the cell, which I haven't yet discussed, is the energy provision system - I also haven't found so much about the origins of this.
Energy provision in all cells revolves to a large extent around ADP and ATP - adenosine di-phosphate and adenosine tri-phosphate. As its name suggests, an ADP molecule has two phosphate groups. It is turned into ATP by the addition of a third phosphate group - this requires energy to do it. At the site of a reaction requiring energy, ATP gives up its third phosphate, and gives up the energy at the same time. In effect it operates like a little battery.
Now what is ADP? It is nothing more than the adenine nucleotide with one extra phosphate group attached. In other words once one has a good explanation for the origin of the nucleotides and RNA, it may be a fairly small step to understanding the origin of ADP/ATP.
It is generally thought that the first living organisms were chemolithoautotrophs - a somewhat horrendous word meaning that they used abiotically produced substances both for building their bodies and for provision of energy.
One of the key inputs to the energy cycles of modern organisms is pyruvate, which was mentioned above as being produced abiotically. Possibly this abiotically produced pyruvate formed an energy input to early organisms, until they were able to produce it themselves from glucose by glycolysis . However the first living things made no use of the primary energy source used by practically all organisms today: photosynthesis. Plants capture the energy of sunlight; animals and fungi use the energy provided by plants in the form of carbohydrates. Photosynthesis generates oxygen and, as mentioned above, oxygen only turns up in the earth's atmosphere much later.
When did the protobionts cross the boundary to living cells?
There is no hard boundary between non-life and life. Even today the boundary is not as clear as many people think. Are viruses living organisms? Most scientists consider they are not, as they are unable to function independently - they can only reproduce within another living cell. But if on those grounds they are not, what about the Mycoplasma - very simple bacteria which are also entirely parasitic. (And if viruses are living organisms, what about prions?)
It is however probably fair to say that when protobionts got to the stage of being able to produce multiple offspring, which are each almost but not quite accurate copies of the mother cell, and when those offspring could compete with each other, the most successful ones surviving to produce further offspring which inherited the traits which made their parent successful, then life had arrived.
In other words life arrived at the moment when evolution started: the biological arms race of organisms each trying to outdo each other, which eventually lead to the myriads of life forms today, including falcons, swallows and birdwatchers. (Determining the survival advantage of birdwatching is left as an exercise for the reader.)
Life may actually have started more than once, at different places or times. But various pieces of evidence - such as the common genetic code in all living organisms - strongly suggest that eventually only the offspring of one abiogenesis survived. Ultimately even the bacteria are our cousins.
As mentioned above, it is thought very likely that the first living organisms used RNA as genetic material. At some point this was replaced by DNA. In principle this is not very difficult - the structures of the two molecules are very similar, and it would not be difficult to produce DNA. On the other hand it is not yet known what the immediate advantage of producing DNA was. Once the genetic information was held in DNA though, the organisms involved would have gained a major advantage due to the extra stability of the information and faithfulness of copying, thus allowing larger genomes and more complex organisms.
The earth, together with the rest of the solar system, is 4.5 to 4.6 aeons old, though the crust didn't solidify until after a few hundred million years.
Scientists have for some time been pushing back the date at which life is known to have existed. Evidence of life from Western Australia at about 3.46 aeons ago now seems to be widely accepted, with a slightly less certain identification at 3.77 aeons. Evidence of life in the extremely ancient rocks of SW Greenland, about 3.85 aeons old, has been proposed, rejected, and then proposed again.
We know, thanks to evidence from the moon, that in the early days of the solar system the planets were subject to heavy bombardment from the giant rocks which still littered the solar system. This bombardment would presumably have made life impossible. It ended around 4.0 to 3.8 aeons ago.
This leaves us with the tantalising possibility that life actually began just about as soon as it could have done - perhaps within a few million years of it being physically possible. (That's 0.1% of the current age of the earth.) This may suggest that the formation of life is actually rather straightforward!
No. Firstly because many of the important prebiotic reactions do not occur in an oxygen-rich atmosphere, which is what we have today. And if any abiogenetic steps did take place somewhere, the resulting products would be quickly gobbled up by the microbes that inhabit every habitat on earth.
Serious research into the origins of life only really became possible after the discovery of the structure of DNA by Crick and Watson in 1953 and the subsequent unravelling of the genetic code. Much has been learned since then about how living organisms work, and origin-of-life research has accelerated over the last twenty years. Nonetheless it remains an immensely complex and difficult subject. Gradually however we have got to the stage where the problem is not so much that no-one can think of any ways that it could have happened, but rather that there are so many different good ideas floating around that it is difficult to choose the best candidates for further research.
Whether we will ever a see a complete experimental demonstration of abiogenesis, starting from inorganic chemicals, is unknown. It may turn out to require a laboratory the size of Lake Victoria and a duration of millions of years. But a detailed description of a plausible route, with experimental demonstration of several of the main steps, may soon be within reach.
It seems to be an unsurmountable problem that the links one refers to from pages like this vanish rapidly. I have managed to repair a few broken links, but some of the web resources mentioned below seem to have gone for good, unfortunately. Dead links are marked in appropriately funereal black.
The title of this essay (and to some extent the idea of writing it at all) was inspired by a sentence from this book, where Francis Crick is explaining some crystallographic theory to Watson:
The rules were in fact so simple that Francis considered writing them up under the title "Fourier Transforms for the Birdwatcher".
The Beginnings of Life on Earth (Christian de Duve) Mainly about a possible role for thioesters in abiogenesis, but includes a long and very readable introduction.
Origins of Life on Earth (Leslie Orgel)
First Cell (Carl Zimmer) (Go to www.discover.com, and follow links to Archive of November 1995.) Account of possible origin of lipid cell walls.
The probability of Abiogenesis (Andrew Ellington) Summary of likelihood of various steps in abiogenesis process via the nucleic-acid-first route.
Lies, Damned Lies, Statistics and Probability of Abiogenesis Calculations (Ian Musgrave) - what's wrong with creationists "abiogenesis is impossible" arguments, and a sketch of some of the steps involved.
http://www.fas.org/mars/nn961106.htm Life on earth began at least 3.85 billion years ago (NASA)
The Origin of Life web site (Michael Russell) Account of possible origin of life at a submarine vent.
ATP and biological energy (M.J. Farabee) Well illustrated description of how ATP operates.
Introduction to Glycolysis (Jon Maber)
Self-Reproducing Molecules Reported by MIT Researchers (Eugene F. Mallove)
Chirality and the Origin of Life (Jeremy Bailey) A suggestion as to why the use by living organisms of left-handed amino acids may not be an accident.
Hypermedia glossary of genetic terms (Birgid Schlindwein)
The Genetic Code (Shaun Black) A listing of the genetic code - i.e. which triplets code which amino acids.
http://www.rpi.edu/~zukerm/MATH-4961/intro/ (Michael Zuker) More detailed version of the genetic code, with characteristics of the 20 amino acids - in tables towards the bottom of the page.
"The comet is coming", Nigel Calder, 1980
http://corona.eps.pitt.edu/www_GPS/courses/GEO0871/Cells/abiogenesis.html - Abiogenesis (University of Pittsburgh) Summary of university course material - reasonable nutshell account.
http://www.evsc.virginia.edu/~lkb2e/101/lec22.htm - The Origin of Life on Earth (University of Virginia) Another quick canter through the basics, if you'll excuse the mixed metaphors. (Actually the link is now broken - this horse appears to have bolted. Never mind, I've got a rustled copy here.)
http://www.csulb.edu/~bruss/courses/biol211A/211Alect10.htm (Brusslan) More lecture notes; useful bits and pieces, though she is a little confused about panspermia.
Experiment Backs Novel Theory on Origin of Life (Nicholas Wade) Brief non-technical account of the work by Wächtershäuser on iron sulphide catalysis.
Molecular Evolution - Book Review (Aristotel Pappelis) Nutshell description (but technical) of the protein-first route. Pappelis is a colleague of Fox, and they are both rather fond of blowing their own trumpet - but that doesn't necessarily mean they're wrong.
How Did Life Evolve? How Did Macromolecules, Metabolic Pathways, and Finally Cells Come into Being? (Peter v. Sengbusch) Example page from a site with an enormous amount of biological information.
Yale Scientists Recreate Molecular "Fossils" (Science Daily / Yale University) Account of artificial DNA enzyme.
http://rnaworld.princeton.edu/~rdknight/Papers/NYAS98.doc - Is the Genetic Code Really a Frozen Accident? New Evidence from In Vitro Selection (Robin D. Knight and Laura F. Landweber) - now unavailable.
Hypercycles (V.G. Redko) Lots of mathematics about hypercycles, and a few scattered words as well.
Turning a corner in the search for the origin of life (Peter Wills) Scan of some of the work currently going on. (Another broken link).
Back to uk.rec.birdwatching main page