17 Gene Expression Overview

Andrea Bierema


Learning Objectives

Students will be able to:

  • Describe the structure and purpose of DNA and RNA.
  • Describe the general process of protein synthesis.
  • Describe the molecular anatomy of genes and genomes.
  • Identify DNA and mRNA bases and binding patterns.
  • Interpret a codon-amino acid chart.
  • Given a DNA sequence, determine the corresponding mRNA sequence and amino acid sequence.

What is a Gene?

The gene is the basic physical unit of inheritance. Genes are passed from parents to offspring and contain the information needed to specify traits. Genes are arranged, one after another, on structures called chromosomes. A chromosome contains a single, long DNA molecule- only a portion of which corresponds to a single gene- as well as the structural proteins (called histones) that the DNA molecule wraps around. Humans have approximately 20,000 genes arranged on their chromosomes. Watch the following brief video for an animated view of the relationship between chromosomes and genes.

Central Dogma

The central dogma of molecular biology is that DNA codes for RNA and RNA codes for protein. In addition to DNA coding for RNA, much of the DNA regulates the synthesis of RNA- which ultimately means that it regulates the synthesis of protein. We will learn about gene regulation in later chapters.


DNA makes RNA via transcription and then makes protein via translation.
The central dogma states that DNA is used to make RNA via transcription, which is used to make protein via translation.

Because proteins are coded by genes, the term “gene expression” refers to protein synthesis (i.e., making proteins), including the regulation of that synthesis.

There are two main processes that must occur to synthesize proteins: transcription and translation. During the process of transcription—which occurs in the nucleus—an mRNA molecule is created by reading the DNA. Note that DNA never “becomes” RNA; rather, the DNA is “read” to make an RNA molecule. The mRNA leaves the nucleus and then, through the process of translation, the mRNA is read to create an amino acid sequence that folds into a protein.

Transcription occurs in the nucleus and translation occurs outside of the nucleus at the ribosomes (which are either in the cytoplasm or attached to the rough endoplasmic reticulum. Below is a micrograph image that was taken of this area and the other is a cartoon representation.

Part of a circle in the upper right (labeled as the nucleus), with thin rows around it (rough ER). The rows are studded with circles, which are ribosomes. A small structure (a protein) from the rough ER is put into a vesicle (a circle) to be carried to another structure labeled as the golgi appartus. This structure has several folds that the protein travels through, and then it leaves the cell. Another structure near the rows around the nucleus is the smooth ER.
Cartoon image of the nucleus and rough ER (it is “rough” because ribosomes are attached it). This cartoon also shows what can happen to protein after it is produced, which is leave the cell through vesicles.


Circle in the upper left with dark and light gray splotches in it. It is surrounded by free small circles and then by thin rows.
Electron micrograph of part of the nucleus and the rough endoplasmic reticulum (RER) in an acinar cell from the pancreas of the small brown bat, Myotis lucifugus. The nucleus of the cell is in the upper left corner; the RER in the lower half of the micrograph is stacked and studded with ribosomes. Figure 168 from Chapter 5 (Endoplasmic Reticulum) of ‘The Cell, 2nd Ed.’ by Don W. Fawcett M.D.

Consider what the terms “transcribe” and “translate” mean in relation to language. To “transcribe” something means to rewrite text again in the same language while to “translate” something means to rewrite the text in a different language. Similar to these meanings, in biology, DNA is transcribed into RNA: both DNA and RNA are made of nucleic acid (i.e., the same “language”). With the assistance of proteins, DNA is “read” and transcribed into an mRNA sequence. To read RNA and create protein, though, we refer to it as being translated: RNA is made of nucleic acid, and protein is made of amino acids (i.e., different “languages”). Therefore, DNA is transcribed to create an mRNA sequence, and then the mRNA sequence is translated to make a protein.


The two main types of nucleic acids are deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). As described earlier in this chapter, DNA is the genetic material in all living organisms, ranging from single-celled bacteria to multicellular mammals. It is in the nucleus of eukaryotes and in the organelles mitochondria and chloroplasts. In prokaryotes, the DNA is not enclosed in a membranous envelope.

The cell’s entire genetic content is its genome, and the study of genomes is genomics. In eukaryotic cells but not in prokaryotes, a DNA molecule may contain tens of thousands of genes. Many genes contain information to make protein products (e.g., mRNA). Other genes code for RNA products. DNA controls all of the cellular activities by turning the genes “on” or “off.”

The other type of nucleic acid, RNA, is mostly involved in protein synthesis. The DNA molecules never leave the nucleus but instead use an intermediary molecule to communicate with the rest of the cell. This intermediary is the messenger RNA (mRNA). Other types of RNA—like rRNA, tRNA, and microRNA—are involved in protein synthesis and its regulation.

DNA and RNA are comprised of monomers that scientists call nucleotides. The nucleotides combine with each other to form a polynucleotide, DNA or RNA. Three components comprise each nucleotide: a nitrogenous base, a pentose (five-carbon) sugar, and a phosphate group. Each nitrogenous base in a nucleotide is attached to a sugar molecule, which is attached to one or more phosphate groups. Therefore, although the terms “base” and “nucleotide” are sometimes used interchangeably, a nucleotide contains a base as well as part of the sugar-phosphate backbone.

RNA (ribonucleic acid) is made of a single-stranded helix of sugar-phosphates and nucleobases. Nucleobases of RNA are cytosine (C), guanine (G), adenine (A), and uracil (U). DNA (deoxyribonucleic acid) is made of a double-stranded helix of sugar-phosphates and base pairs. Nucleobases of DNA are cytosine (C, guanine (G), adenine (A), and thymine (T). Nucleobases vary in molecular structure.
Comparison of the molecular structure of RNA and DNA.

Comparison of RNA (left molecule) and DNA (right molecule). The color of the bases in RNA and DNA aligns with the colored boxes next to each base molecule.


Examine the image above and then answer the following questions:

Protein Synthesis Overview

The two main processes in protein synthesis are transcription and translation. The following is an overview of each of these processes. Each process will be described in more detail in future chapters. Note that the rest of this textbook will focus on what happens in eukaryotic cells. Please see this page by Lumen for details on prokaryotic gene expression.


A gene is complex: it contains not only the code for the resulting protein but also several regulatory factors that determine if and when the region that codes for a protein are read to create protein. What follows is a diagram of the components of a gene that are used in transcription.


This textbook focuses on the DNA and the ending product of transcription: mRNA.


Given a specific DNA strand, what is the sequence of the resulting mRNA molecule? We will learn about how mRNA is created in a later chapter.


Translation involves different types of RNA, and we will explain them in more detail in later chapters: rRNA, tRNA, mRNA, and microRNA.

After an mRNA is created, it leaves the nucleus and is attracted to or attracts a ribosome, which is a molecule made of rRNA and polypeptides. Then, in the ribosome, and with the assistance of tRNAs, the mRNA is read and an amino acid sequence is created.

DNA and mRNA create sequences with just four types of bases; yet, these bases code for 20 unique amino acids (the makeup of protein). How is this possible? Watch the following video to find out!

For closed captioning or to view the full transcript, click on the “YouTube” link in the video (or click here) and view the video on YouTube.

The mRNA is read in sets of three bases known as codons. Each codon codes for a single amino acid. In this way, the mRNA is read and the protein product is made.

Below is a chart showing which codons code for which bases. There are two representations; move to the next slide for the second representation.

These charts can be a little confusing at first. Watch the following video to learn how to interpret both chart formats.



This chapter focused on DNA, mRNA, and protein sequences. The next several chapters describe gene expression processes- both protein synthesis and regulation of that synthesis. Master how sequences are read during protein synthesis (the focus of the current chapter) before moving on to the next chapter. Below are some sources to help further your understanding!


Check out Learn.Genetics’ “How a Firefly’s Tail Makes Light” video for an overview of protein synthesis!


Need a little more practice?

Try out Learn.Genetics’ “Transcribe and Translate a Gene” and The Concord Consortium’s “DNA to Protein” interactives for further practice!


This chapter is a modified derivative of the following articles:

Gene” by National Human Genome Research Institute, National Institutes of Health, Talking Glossary of Genetic Terms. 

“Nucleic Acids” by OpenStax College, Biology 2e, CC BY 4.0. Download the original article at https://openstax.org/books/biology-2e/pages/3-5-nucleic-acids