17 Gene Expression Overview

Andrea Bierema


Learning Objectives

Students will be able to:

  • Describe the structure and purpose of DNA and RNA.
  • Describe the general process of protein synthesis.
  • Describe the molecular anatomy of genes and genomes.
  • Identify DNA and mRNA bases and binding patterns.
  • Interpret a codon-amino acid chart.
  • Given a DNA sequence, determine the corresponding mRNA sequence and amino acid sequence.

What is a Gene?

The gene is the basic physical unit of inheritance. Genes are passed from parents to offspring and contain the information needed to specify traits. Genes are arranged, one after another, on structures called chromosomes. A chromosome contains a single, long DNA molecule- only a portion of which corresponds to a single gene- as well as the structural proteins (called histones) that the DNA molecule wraps around. Humans have approximately 20,000 genes arranged on their chromosomes. Watch the following brief video for an animated view of the relationship between chromosomes and genes.

Central Dogma

The central dogma of molecular biology is that DNA codes for RNA and RNA codes for protein. In addition to DNA coding for RNA, much of the DNA regulates the synthesis of RNA- which ultimately means that it regulates the synthesis of protein. We will learn about gene regulation in later chapters.


Two squiggly lines labeled DNA. Arrow labeled transcription. Single squiggly line labeled RNA. Arrow labeled translation. Circular structure labeled protein.
The central dogma states that DNA is used to make RNA via transcription, which is used to make protein via translation.

Because proteins are coded by genes, the term “gene expression” refers to protein synthesis (i.e., making proteins), including the regulation of that synthesis.

There are two main processes that must occur to synthesize proteins: transcription and translation. During the process of transcription—which occurs in the nucleus—an mRNA molecule is created by reading the DNA. Note that DNA never “becomes” RNA; rather, the DNA is “read” to make an RNA molecule. The mRNA leaves the nucleus and then, through the process of translation, the mRNA is read to create an amino acid sequence that folds into a protein.

Transcription occurs in the nucleus and translation occurs outside of the nucleus at the ribosomes (which are either in the cytoplasm or attached to the rough endoplasmic reticulum. Below is a micrograph image that was taken of this area and the other is a cartoon representation.

Half circle surrounded by a ribbon-like structure with other ribbon-like structure near it.
Cartoon image of the nucleus (upper left half circle) and rough Endoplasmic reticulum (ER; it is “rough” because ribosomes are attached to it). The ER are the thin rows in the image and is studded with circles that represent ribosomes. This cartoon also shows what can happen to protein after it is produced. That is, a new protein is engulfed and transported via a vesicle to the Golgi apparatus for modification, then transported to the plasma membrane via another transport vesicle, and released from the cell.


Circle in the upper left with dark and light gray splotches in it. It is surrounded by free small circles and then by thin rows.
Electron micrograph of part of the nucleus and the rough endoplasmic reticulum (RER) in an acinar cell from the pancreas of the small brown bat, Myotis lucifugus. The nucleus of the cell is in the upper left corner; the RER in the lower half of the micrograph is stacked and studded with ribosomes. Figure 168 from Chapter 5 (Endoplasmic Reticulum) of ‘The Cell, 2nd Ed.’ by Don W. Fawcett M.D. Image sourced from The Cell Image Library.

Consider what the terms “transcribe” and “translate” mean in relation to language. To “transcribe” something means to rewrite text again in the same language while to “translate” something means to rewrite the text in a different language. Similar to these meanings, in biology, DNA is transcribed into RNA: both DNA and RNA are made of nucleic acid (i.e., the same “language”). With the assistance of proteins, DNA is “read” and transcribed into an mRNA sequence. To read RNA and create protein, though, we refer to it as being translated: RNA is made of nucleic acid, and protein is made of amino acids (i.e., different “languages”). Therefore, DNA is transcribed to create an mRNA sequence, and then the mRNA sequence is translated to make a protein.


The two main types of nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). As described earlier in this chapter, DNA is the genetic material in all living organisms, ranging from single-celled bacteria to multicellular mammals. It is in the nucleus of eukaryotes and in the organelles mitochondria and chloroplasts. In prokaryotes, the DNA is not enclosed in a membranous envelope.

The cell’s entire genetic content is its genome, and the study of genomes is genomics. In eukaryotic cells but not in prokaryotes, a DNA molecule may contain tens of thousands of genes. Many genes contain information to make protein products (e.g., mRNA). Other genes code for RNA products. DNA controls all of the cellular activities by turning the genes “on” or “off.”

The other type of nucleic acid, RNA, is mostly involved in protein synthesis. The DNA molecules never leave the nucleus but instead, use an intermediary molecule to communicate with the rest of the cell. This intermediary is the messenger RNA (mRNA). Other types of RNA—like rRNA, tRNA, and microRNA—are involved in protein synthesis and its regulation.

DNA and RNA are comprised of monomers that scientists call nucleotides. The nucleotides combine with each other to form a polynucleotide, DNA, or RNA. Three components comprise each nucleotide: a nitrogenous base, a pentose (five-carbon) sugar, and a phosphate group. Each nitrogenous base in a nucleotide is attached to a sugar molecule, which is attached to one or more phosphate groups. Therefore, although the terms “base” and “nucleotide” are sometimes used interchangeably, a nucleotide contains a base as well as part of the sugar-phosphate backbone.

Long helical line and double helical lines, both with short lines coming off of it labeled as nucleobases. Hexagonal shapes labeled as C, G, A, U, and T on each edge of the image.
Comparison of the molecular structure of RNA (left molecule) and DNA (right molecule). The color of the bases in RNA and DNA aligns with the colored boxes next to each base molecule. Nucleobases of RNA are cytosine (C), guanine (G), adenine (A), and uracil (U). Nucleobases of DNA are cytosine (C), guanine (G), adenine (A), and thymine (T). Notice the difference in the number of strands and types of bases.



Examine the image above and then answer the following questions:

Protein Synthesis Overview

The two main processes in protein synthesis are transcription and translation. The following is an overview of each of these processes. Each process will be described in more detail in future chapters. Note that the rest of this textbook will focus on what happens in eukaryotic cells. Please see this page by Lumen for details on prokaryotic gene expression.


A gene is complex: it contains not only the code for the resulting protein but also several regulatory factors that determine if and when the region that codes for a protein is read to create protein. What follows is a diagram of the components of a gene that are used in transcription.


This textbook focuses on the DNA and the ending product of transcription: mRNA.


Given a specific DNA strand, what is the sequence of the resulting mRNA molecule? We will learn about how mRNA is created in a later chapter.


Translation involves different types of RNA, and we will explain them in more detail in later chapters: rRNA, tRNA, mRNA, and microRNA.

After an mRNA is created, it leaves the nucleus and is attracted to or attracts a ribosome, which is a molecule made of rRNA and polypeptides. Then, in the ribosome, and with the assistance of tRNAs, the mRNA is read and an amino acid sequence is created.

DNA and mRNA create sequences with just four types of bases; yet, these bases code for 20 unique amino acids (the makeup of protein). How is this possible? Watch the following video to find out!

For closed captioning or to view the full transcript see the video on YouTube. Or click on the “YouTube” link in the video.

The mRNA is read in sets of three bases known as codons. Each codon codes for a single amino acid. In this way, the mRNA is read and the protein product is made.

Below is a table showing which codons code for which bases.

Codon Chart

Codon Amino Acid
UUU Phenylalanine (Phe)
UUC Phenylalanine (Phe)
UUA Leucine (Leu)
UUG Leucine (Leu)
CUU Leucine (Leu)
CUC Leucine (Leu)
CUA Leucine (Leu)
CUG Leucine (Leu)
AUU Isoleucine (Ile)
AUC Isoleucine (Ile)
AUA Isoleucine (Ile)
AUG Methionine (Met)
GUU Valine (Val)
GUA Valine (Val)
GUG Valine (Val)
UCU Serine (Ser)
UCC Serine (Ser)
UCA Serine (Ser)
UCG Serine (Ser)
CCU Proline (Pro)
CCC Proline (Pro)
CCA Proline (Pro)
CCG Proline (Pro)
ACU Threonine (Thr)
ACC Threonine (Thr)
ACA Threonine (Thr)
ACG Threonine (Thr)
GCU Alanine (Ala)
GCC Alanine (Ala)
GCA Alanine (Ala)
GCG Alanine (Ala)
UAA Stop (not an amino acid)
UAG Stop (not an amino acid)
CAU Histidine (His)
CAC Histidine (His)
CAA Glutamine (Gln)
CAG Glutamine (Gln)
AAU Asparagine (Asn)
AAC Asparagine (Asn)
AAA Lysine (Lys)
AAG Lysine (Lys)
GAU Aspartic Acid (Asp)
GAC Aspartic Acid (Asp)
GAA Glutamic Acid (Glu)
UAU Tyrosine (Tyr)
UAC Tyrosine (Tyr)
UGU Cysteine (Cys)
UGC Cysteine (Cys)
UGA Stop (not an amino acid)
UGG Tryptophan (Trp)
CGU Arginine (Arg)
CGC Arginine (Arg)
CGA Arginine (Arg)
CGG Arginine (Arg)
AGU Serine (Ser)
AGC Serine (Ser)
AGA Arginine (Arg)
AGG Arginine (Arg)
GGU Glycine (Gly)
GGC Glycine (Gly)
GGA Glycine (Gly)
GGG Glycine (Gly)

Codon chart of triplet mRNA base codes and corresponding amino acids.

The following are two representations of the information in the above table; move to the next slide for the second representation. These representations are commonly used in biology textbooks.

These charts can be a little confusing at first. Watch the following video to learn how to interpret both chart formats.



This chapter focused on DNA, mRNA, and protein sequences. The next several chapters describe gene expression processes- both protein synthesis and regulation of that synthesis. Master how sequences are read during protein synthesis (the focus of the current chapter) before moving on to the next chapter. Below are some sources to help further your understanding!


Check out Learn.Genetics’ “How a Firefly’s Tail Makes Light” video for an overview of protein synthesis!


Need a little more practice?

Try out Learn.Genetics’ “Transcribe and Translate a Gene” and The Concord Consortium’s “DNA to Protein” interactives for further practice!


This chapter is a modified derivative of the following articles:

Gene” by National Human Genome Research Institute, National Institutes of Health, Talking Glossary of Genetic Terms. 

“Nucleic Acids” by OpenStax College, Biology 2e, CC BY 4.0. Download the original article at https://openstax.org/books/biology-2e/pages/3-5-nucleic-acids

Share This Book