DNA, RNA, and Protein Synthesis: The Central Dogma

Molecular biology runs on a single organizing principle so fundamental that Francis Crick named it the "Central Dogma" in 1958 — a phrase that has survived decades of revision and refinement without losing its explanatory power. This page covers the mechanics of how genetic information flows from DNA through RNA to protein, the enzymes and structures that carry out each step, and the places where the textbook version gets complicated. Understanding this flow matters because virtually every modern medical treatment targeting genetics, from mRNA vaccines to CRISPR therapies, depends on precise manipulation of these same pathways.


Definition and scope

The Central Dogma describes the directional transfer of biological sequence information: DNA is transcribed into RNA, and RNA is translated into protein. Crick's original 1958 formulation, later clarified in a 1970 Nature paper, distinguished between general transfers (permitted by all living cells), special transfers (permitted under specific circumstances, such as reverse transcription by retroviruses), and unknown transfers that had not yet been observed.

The scope is deliberately precise. The dogma is a statement about sequence information, not about molecular interactions at large. It does not claim that proteins cannot influence DNA — they obviously do, through transcription factors, histones, and repair enzymes. It claims that proteins cannot encode new sequence information back into DNA. That distinction gets lost in casual descriptions surprisingly often, and the bioscience frequently asked questions section addresses several downstream confusions that arise from it.

In the human genome, approximately 20,000 protein-coding genes occupy roughly 1.5% of the roughly 3.2 billion base pairs in the haploid genome (National Human Genome Research Institute, NHGRI). The remaining sequence encodes regulatory elements, non-coding RNAs, repetitive elements, and regions whose functions remain under active investigation.


Core mechanics or structure

Replication sits upstream of both transcription and translation. Before a cell divides, DNA polymerase copies the entire genome with an error rate of approximately 1 mistake per 10⁹ to 10¹⁰ nucleotides incorporated — a fidelity achieved through proofreading exonuclease activity built into the enzyme itself (NCBI Bookshelf, Molecular Biology of the Cell, 6th ed.).

Transcription converts a DNA template into a complementary messenger RNA (mRNA) strand. In eukaryotes, RNA polymerase II handles most protein-coding genes. The process proceeds in three stages:

  1. Initiation — transcription factors assemble at the promoter, recruiting RNA polymerase to the transcription start site.
  2. Elongation — the polymerase unwinds approximately 12–14 base pairs of DNA at a time and synthesizes RNA in the 5′-to-3′ direction at a rate of roughly 20–30 nucleotides per second in mammalian cells.
  3. Termination — specific sequences signal cleavage and polyadenylation of the transcript.

The resulting pre-mRNA undergoes 5′ capping, 3′ polyadenylation (a tail of roughly 200 adenine residues), and splicing, where introns are excised by the spliceosome — a ribonucleoprotein complex containing 5 small nuclear RNAs and over 150 protein components (NIH National Institute of General Medical Sciences).

Translation occurs at the ribosome, a two-subunit molecular machine. In eukaryotes, the small (40S) subunit scans the mRNA for the AUG start codon. The large (60S) subunit catalyzes peptide bond formation through peptidyl transferase activity — notably, this activity resides in the ribosomal RNA itself, making the ribosome a ribozyme, not a protein enzyme. Transfer RNA (tRNA) molecules ferry individual amino acids, with each tRNA's anticodon matching the mRNA codon through Watson-Crick base pairing. Elongation adds approximately 3–5 amino acids per second to the growing chain in mammalian cells.


Causal relationships or drivers

The flow of information is not passive — it is tightly regulated at every transition. Promoter strength, enhancer proximity, chromatin accessibility, and the concentration of available transcription factors all determine whether and how rapidly a gene is transcribed. The ENCODE Project, a consortium effort coordinated by NHGRI, catalogued over 4 million regulatory elements in the human genome by 2020 (ENCODE Project Consortium, Nature, 2020), dismantling any notion that the genome is mostly inert filler.

Translation efficiency is separately controlled. mRNA stability, codon usage bias, the secondary structure of the 5′ untranslated region, and the availability of ribosomes all modulate how much protein a given mRNA produces. A transcript can be transcribed abundantly and translated rarely — or vice versa — making the proteome a poor direct proxy for the transcriptome.

Post-translational modification adds another regulatory layer: phosphorylation, ubiquitination, glycosylation, and acetylation can activate, silence, relocalize, or mark proteins for degradation, none of which are encoded in the DNA sequence itself.


Classification boundaries

Crick's three categories of information transfer provide a useful taxonomy:

General transfers (occur in all cells): DNA → DNA (replication), DNA → RNA (transcription), RNA → Protein (translation).

Special transfers (occur under specific conditions): RNA → DNA (reverse transcription, carried out by retroviruses including HIV and encoded in LTR retrotransposons that constitute roughly 8% of the human genome); RNA → RNA (RNA replication by RNA-dependent RNA polymerases in RNA viruses).

Transfers not observed: Protein → DNA, Protein → RNA, Protein → Protein in the sense of sequence encoding. Prions are sometimes cited as a counterexample, but prion propagation is a conformational template effect, not a sequence-level information transfer.

The key dimensions and scopes of bioscience page maps how these molecular-scale processes connect to broader biological organization levels.


Tradeoffs and tensions

The dogma is a framework, and frameworks have edges. Several genuine tensions complicate its application:

Reverse transcription in normal human cells: Human LINE-1 retrotransposons — active in roughly 100 copies per haploid genome — use reverse transcriptase to copy RNA back into DNA and reinsert it. This is not viral infection; it is an endogenous process. Its contribution to somatic mutation and potentially to neuronal diversity is an open research area.

Epitranscriptomics: Chemical modifications to RNA (m⁶A methylation is the most studied, with over 10,numerous sites mapped in the human transcriptome) affect transcript stability, splicing, and translation. These modifications are enzymatically added and removed, adding a layer of information that the dogma's original framing did not anticipate.

Non-coding RNA: The 2023 Nobel Prize in Physiology or Medicine recognized the role of microRNAs in post-transcriptional regulation. Small interfering RNAs, long non-coding RNAs, and circular RNAs all influence gene expression without themselves being translated. The dogma's protein-centric endpoint is incomplete for describing cellular information processing in full.

Prions and protein-based inheritance: Yeast prions can propagate heritable phenotypes across generations through conformational templating of proteins. This is a genuine hereditary system that bypasses nucleic acids entirely, though — again — it does not involve sequence information transfer.


Common misconceptions

"The Central Dogma says DNA controls everything." The dogma makes no such claim. It describes information flow directionality, not causal hierarchy. Environment, metabolites, and cellular state regulate DNA accessibility continuously.

"RNA is just a messenger — a passive intermediate." Catalytic RNA (ribozymes), regulatory microRNAs, and structural rRNAs are functionally active molecules. The ribosome's peptidyl transferase center is RNA, not protein.

"Junk DNA is non-functional." The phrase "junk DNA" originated with Susumu Ohno in 1972 and was applied to repetitive sequences. The ENCODE Project's findings shifted consensus toward widespread functional roles for non-coding sequence, though the extent of that function remains actively debated in the literature.

"One gene = one protein." Alternative splicing allows a single gene to produce multiple distinct protein isoforms. The human DSCAM gene in Drosophila melanogaster — a model organism — can theoretically generate over 38,000 isoforms through combinatorial exon selection, as documented by Schmucker et al. in Cell (2000).

"Translation is error-free." Ribosomal misincorporation occurs at roughly 1 error per 10³ to 10⁴ codons — orders of magnitude less accurate than DNA replication. Most misfolded proteins are caught by chaperones or degraded by the proteasome, but translational errors are a real component of cellular protein quality control burden.


Checklist or steps

The following sequence represents the flow of genetic information from gene to functional protein in a eukaryotic cell:


Reference table or matrix

Stage Key Molecule(s) Location (Eukaryotes) Rate / Scale Error Rate
Replication DNA polymerase δ/ε, helicase Nucleus ~50 bp/sec (human) ~1 per 10⁹–10¹⁰ bp
Transcription RNA Pol II, transcription factors Nucleus ~20–30 nt/sec ~1 per 10⁴–10⁵ nt
Pre-mRNA Processing Spliceosome (~150 proteins + 5 snRNAs), capping/polyadenylation enzymes Nucleus Minutes to hours Variable by intron
Nuclear Export Nuclear pore complex (~30 nucleoporins) Nuclear envelope Seconds per mRNA
Translation Ribosome (40S + 60S), tRNA, eIF factors Cytoplasm / ER ~3–5 aa/sec ~1 per 10³–10⁴ codons
Post-translational Modification Kinases, glycosyltransferases, ubiquitin ligases ER, Golgi, cytoplasm Variable

The how-it-works section provides a broader systems view of how these individual molecular processes integrate into cellular behavior.


References