Ready For Print ZOO3649 Course Guide PDF
Ready For Print ZOO3649 Course Guide PDF
Ready For Print ZOO3649 Course Guide PDF
Department of Zoology
Email: yoshan.moodley@univen.ac.at
Email: khomotsonkadimeng38@gmail.com
ZOO2648 is not a prerequisite, but if you attended it you will find it very useful.
READING
*PDF versions of both books are available from the course assistant
ASSESSMENT
The total of this module is 28 credits. This module will be assessed as follows:
2x Assignments – to be given during the practical sessions for completion during the
week. Deadlines for handing in assignments are exactly one week. Each assignment is
worth 10% of your final mark, so please take them seriously.
1x Literature review and presentation. Each student will be given a different scientific
paper, and they will have to present the paper using a Powerpoint presentation the
following week. This will count for 10% of your final mark.
3x Tests – These will also be given during the practical sessions. Each one is worth 10%
of your final mark.
Students are required to attend all lectures and practical sessions. Students must have a
minimum of 70% attendance to be able to write the final exam. You will be required to sign a
class attendance sheet at each practical session.
TABLETS
You MUST bring your university tablets, fully charged, to all lectures and practical sessions.
Software to be used in the practical sessions (see practical timetable) must be obtained from
the course assistant and installed on your tablets before the relevant practical session.
Evolution, therefore, will be the central theme of this module and understanding it will be
essential to achieving the module outcomes below.
Module Structure
First you will be given a brief introductory lecture to evolutionary genetics and evolutionary
thinking, which will help you to deal with the ideas and concepts that you will encounter
throughout the module. The first section “The Central Dogma” will teach you about the basic
molecular genetics of DNA and RNA, the material upon which all life is based. After this, in the
section “Recombinant DNA Technology”, we will explore the key developments arising from
this molecular foundation that have revolutionised our understanding of genetics and allowed
us to apply genetic tools in research. Then you will discover how molecular mechanisms can
generate the variation of life we see around us in the section “Genetic Variation”. You will then
learn about the “Forces of Evolution” that act upon genetic variation, and the main historical
foundations of modern genetics. In the section “Genetic Structure”, you will learn how these
forces of evolution have shaped and partitioned genetic variation into populations and species.
Lastly, in the section “Molecular Ecology”, you will find out how all the information you have
gained thus far is put into practice in the real world of research.
There will be approximately 51 lectures and 27 Practical sessions. Tests and presentations will
be included within the practical sessions. That makes a total of approximately 132 teaching
hours.
1. Synthesize and incorporate the fundamentals of molecular genetics with the variation
of the natural world around us.
2. Define and describe theoretical and historical foundations of evolutionary genetics
3. Identify, describe, distinguish, compare and analyze mechanisms and fundamental
factors (mutation, genetic drift, selection, migration) and their interactions that create
diversification within and between populations and lead to evolutionary change
4. Use empirical methods and tools to describe levels and patterns of genetic diversity and
differentiation in populations and to infer and assess population and evolutionary
genetic structure.
5. Students will be able to reflect before and after the course about how it impacts their
future practice in terms of their ability to carry out post-graduate research.
TIMETABLE
Lectures will be 50 minutes long and will be held four times a week at 11.00 on Mondays,
Tuesdays and Thursdays in FF-017 (Life Sciences and Chemistry Building). On Wednesdays the
lecture will be at 13.00 in the same venue.
Practical sessions will be three hours long and will be held twice a week from 14.00-17.00 on
Tuesdays and Thursdays. Venue to be announced.
Assignments will be handed out and collected during the practical sessions
It is really in your best interest to come to all the practicals and lectures.
This first section of the course will focus on the basic molecules of inheritance – deoxyribonucleic acid
or DNA. You will learn of how it was discovered, how it is organised in the genome and how it replicates.
Francis Crick called the transfer of this DNA-encoded genetic information between the three basic
molecules of life, that is between DNA, RNA and proteins, the central dogma of molecular biology.
Chapters 1, 5, 6, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Section Outcomes.
What is Genetics?
Genetics is the study of the biological process where a parent passes genes onto their offspring. In other
words, genetics is simply the study of inheritance or heredity. Every child inherits genes from both of
their biological parents and these genes in turn express specific traits.
DNA (or deoxyribonucleic acid) is the molecule that carries the genetic information in all cellular forms
of life and some viruses. It belongs to a class of molecules called the nucleic acids, which are
polynucleotides - that is, long chains of nucleotides.
a nitrogenous base: cytosine (C), guanine (G), adenine (A) or thymine (T)
a phosphate molecule
The backbone of the polynucleotide is a chain of sugar and phosphate molecules. Each of the sugar
groups in this sugar-phosphate backbone is linked to one of the four nitrogenous bases.
Strand of polynucleotides
The double helix of the complete DNA molecule resembles a spiral staircase, with two sugar phosphate
backbones and the paired bases in the centre of the helix. This structure explains two of the most
important properties of the molecule. First, it can be copied or 'replicated', as each strand can act as a
template for the generation of the complementary strand. Second, it can store information in the linear
sequence of the nucleotides along each strand.
It is a common misconception that James Watson and Francis Crick discovered DNA in the 1950s. In
reality, DNA was discovered decades before. It was by following the work of the pioneers before them
that James and Francis were able to come to their ground-breaking conclusion about the structure of
DNA in 1953.
The molecule now known as DNA was first identified in the 1860s by a Swiss chemist called Johann
Friedrich Miescher. Johann set out to research the key components of white blood cells, part of our
body’s immune system. The main source of these cells was pus-coated bandages collected from a
nearby medical clinic.
“Johann called this mysterious substance ‘nuclein’. Unbeknown to him, Johann had discovered the
molecular basis of all life – DNA.”
Johann carried out experiments using salt solutions to understand more about what makes up white
blood cells. He noticed that, when he added acid to a solution of the cells, a substance separated from
the solution. This substance then dissolved again when an alkali was added. When investigating this
substance he realised that it had unexpected properties different to those of the other proteins he was
familiar with. Johann called this mysterious substance ‘nuclein’, because he believed it had come from
the cell nucleus. Unbeknown to him, Johann had discovered the molecular basis of all life – DNA. He
then set about finding ways to extract it in its pure form.
This work was rewarded in 1910 when he received the Nobel Prize in
Physiology or Medicine.
Crick's use of the word dogma was unconventional, and has been controversial.
gene = length of DNA coding for functional RNA product a. note definition includes both DNA coding for
proteins (mRNA is functional RNA product) and for rRNAs, tRNAs locus (pl. loci) = position of gene on
chromosome; often used almost synonymously with gene alleles = versions of a gene that differ in their
base sequences .
The gene is the basic physical and functional unit of heredity. It consists of a specific sequence of
nucleotides at a given position on a given chromosome that codes for a specific protein (or, in some
cases, an RNA molecule).
regulatory sequences, which play a role in determining when and where the protein is
made (and how much is made)
A human being has 20,000 to 25,000 genes located on 46 chromosomes (23 pairs).
Unique or single-copy genes code for mRNA which codes for polypeptides
Highly repetitive sequences are also known as satellite DNA, they constitute 5-45% of the genome.
Sequences are 5-300 base pairs per repeat, and my be repeated up to 10,000 times per genome. The
function of repetitive DNA is not known. Since repetitive sequences vary from person to person, they
are useful in DNA profiling, which allows for DNA fingerprinting to identify samples from individuals
The genes plus all the other DNA in the cell, are known collectively, as the genome.
Chromosomes
DNA base pairs are organised into genes, and genes along with other DNA are organised into
chromosomes. The chromosomes contain genes just like pages of a book. Some chromosomes may
Eukaryotic chromosomes
The label eukaryote is taken from the Greek for 'true nucleus', and eukaryotes (all organisms except
viruses, Eubacteria and Archaea) are defined by the possession of a nucleus and other membrane-
bound cell organelles.
This is because the DNA is tightly packed into structures called chromosomes, which consist of long
chains of DNA and associated proteins known as chromatin. In eukaryotes, DNA molecules are tightly
wound around proteins - called histone proteins - which provide structural support and play a role in
controlling the activities of the genes. A strand 150 to 200 nucleotides long is wrapped twice around a
core of eight histone proteins to form a structure called a nucleosome. The histone octamer at the
centre of the nucleosome is formed from two units each of histones H2A, H2B, H3, and H4. The chains
of histones are coiled in turn to form a solenoid, which is stabilised by the histone H1. Further coiling of
the solenoids forms the structure of the chromosome proper.
Each chromosome has a p arm and a q arm. The p arm (from the French word 'petit', meaning small) is
the short arm, and the q arm (the next letter in the alphabet) is the long arm. In their replicated form,
each chromosome consists of two chromatids.
A photograph of the chromosomes in a cell is known as a karyotype. The autosomes are numbered 1-22
in decreasing size order.
Prokaryotic chromosomes
The prokaryotes (Greek for 'before nucleus' - including Eubacteria and Archaea) lack a discrete nucleus,
and the chromosomes of prokaryotic cells are not enclosed by a separate membrane.
Most bacteria contain a single, circular chromosome. (There are exceptions: some bacteria - for
example, the genus Streptomyces - possess linear chromosomes, and Vibrio cholerae, the causative
agent of cholera, has two circular chromosomes.) The chromosome - together with ribosomes and
The genomes of prokaryotes are compact compared with those of eukaryotes, as they lack introns, and
the genes tend to be expressed in groups known as operons. The circular chromosome of the bacterium
Escherichia coli consists of a DNA molecule approximately 4.6 million nucleotides long.
In addition to the main chromosome, bacteria are also characterised by the presence of extra-
chromosomal genetic elements called plasmids. These relatively small circular DNA molecules usually
contain genes that are not essential to growth or reproduction.
In the early 1900s, the work of Gregor Mendel was rediscovered and his ideas about inheritance began
to be properly appreciated. As a result, a flood of research began to try and prove or disprove his
theories of how physical characteristics are inherited from one generation to the next.
In the middle of the nineteenth century, Walther Flemming, an anatomist from Germany, discovered a
fibrous structure within the nucleus of cells. He named this structure ‘chromatin’, but what he had
actually discovered is what we now know as chromosomes. By observing this chromatin, Walther
correctly worked out how chromosomes separate during cell division, also known as mitosis.
“Walter Sutton and Theodor Boveri first presented the idea that the genetic material passed down from
parent to child is within the chromosomes.
The chromosome theory of inheritance was developed primarily by Walter Sutton and Theodor Boveri.
They first presented the idea that the genetic material passed down from parent to child is within the
chromosomes. Their work helped explain the inheritance patterns that Gregor Mendel had observed
over a century before.
American graduate, Walter Sutton, expanded on Theodor’s observation through his work with the
grasshopper. He found it was possible to distinguish individual chromosomes undergoing meiosis in the
testes of the grasshopper and, through this, he correctly identified the sex chromosome. In the closing
Extranuclear DNA
In order for Prokaryotic and Eukaryotic cells to divide and proliferate, they must make copies of
themselves. They need to be duplicated. The genetic information in the cell must also be duplicated.
This process is called DNA replication. It is a complex process involving many enzymes.
The DNA being replicated must be in a ready state for the start of replication, and there also has to be a
clear start point (the origin of replication) from which replication proceeds. As each piece of DNA must
only be copied once, there also has to be an end point to replication.
DNA replication must be carried out accurately, with an efficient proof reading and repair mechanism in
case of any mismatches or errors. And finally, the system of replication must also be able to distinguish
between the original DNA template and then newly copied DNA.
In order to be able to put these principles into context, it is helpful to look at the eukaryotic cell cycle to
see where the main checkpoints are in the process.
Actively dividing eukaryote cells pass through a series of stages known collectively as the cell cycle: two
gap phases (G1 and G2); an S (for synthesis) phase, in which the genetic material is duplicated; and an M
phase, in which mitosis partitions the genetic material and the cell divides.
S phase. DNA synthesis replicates the genetic material. Each chromosome now consists of two sister
chromatids.
G2 phase. Metabolic changes assemble the cytoplasmic materials necessary for mitosis and cytokinesis.
The period between mitotic divisions - that is, G1, S and G2 - is known as interphase.
The main check points in DNA replication occur: between G1phase and S phase: at the start of mitosis
(M phase) and finally between M phase and G1 phase, when the decision is made whether to go
quiescent or not.
Must be ready: G1
Must ensure that each piece of DNA is only replicated once, so need to know where to
end: Replicon
DNA Replication
During DNA replication, each DNA strand is used as a template to synthesize the second DNA strand.
DNA strand is ALWAYS synthesized in the 5' to 3' direction.
DNA replication is semi-conservative: in the "next generation" molecule one strand is "old" and another
is "new"
Replication of DNA is bidirectional. Two Y-shaped replication forks are moving in opposite directions
during DNA replication. It is not trivial to replicate both DNA strands in the 5' to 3' direction! Why?
DNA Polymerase: Matches the correct nucleotide and then joins adjacent nucleotides together
Primase: Provides and RNA primer to start polymerisation
Ligase: Joins adjacent DNA strands together
Helicase: Unwinds the DNA and melts it
Single Strand Binding Proteins: Keep the DNA single stranded after it has been melted by
helicase
Gyrase: A topisomerase that relieves torsional strain in the DNA molecule
Telomerase: Finishes off the ends of the DNA strand
There is much conservation between the two systems, in as much as the enzymology, the replication
fork geometry, the basic fundamental features and the use of multi-protein machinery are all very much
the same in both. However, there are more protein components in the Eukaryotic replication
machinery. In prokaryotes, the replication form moves 10x faster than in eukaryotes.
Transcription literally means the act or process of making a copy of something. Legal secretaries, for
example, transcribe the taped conversations between lawyers and clients by typing them into a word-
processing program.
In genetics, “transcription” refers to the copying of a DNA sequence into an RNA sequence
The structure of DNA is not altered as a result of this process, and it continues to store
information.
Genes are defined as DNA sequences that are transcribed into RNA
The 'information' part of DNA is the nitrogenous base, as opposed to the pentose sugar or the
phosphate residues. In a single-stranded molecule, this important part would be exposed to the cellular
environment, providing more opportunity for it to be mutated by the various chemicals there. In a
double-stranded configuration, however, the two nitrogenous bases are locked within the complex,
facing each other in the centre of the molecule. This organisation helps to safeguard the blueprint DNA
from local mutagens. RNA is a copy of the DNA blueprint and only exists temporarily in the cell, and
therefore does not require this protection, hence it is single stranded.
RNA is fundamentally single-stranded and therefore only one strand of the DNA is actually copied into
RNA during transcription.
The strand that is actually being copied is termed the template strand or antisense strand.
The RNA transcript will have the opposite polarity and the complementary sequence to
this strand
The base sequence of this strand is identical in polarity and sequence to the RNA
transcript
Because the coding strand has the same sequence and polarity as the RNA, it is said to
“carry the gene.”
Template strand
TEMPLATE
Coding strand
Once they are made, RNA transcripts play many different functional roles in the cell:
Since our molecular understanding of gene transcription came from studies involving bacteria (mostly E.
coli) and the viruses that infect them (bacteriophages), we will start there.
More precisely, they direct the exact location for the initiation of transcription
Promoters are typically located just “upstream” (5’) of the site where transcription of a gene actually
begins
The bases in a promoter sequence are numbered in relation to the transcription start
site, which is labeled “+1”
The promoter attracts RNA polymerase, the enzyme responsible for transcribing RNA, to the gene.
Without a promoter, a gene sequence would not be transcribed.
Stages of Transcription
A. Initiation
In E. coli, the RNA polymerase holoenzyme is composed of
o Core enzyme
Four subunits = a2bb’
o Sigma factor
One subunit = s
These subunits play distinct functional roles
At the start of initiation, the RNA polymerase holoenzyme binds loosely to the DNA
It then scans along the DNA, until it encounters a promoter
When it does, the sigma factor recognizes both the –35 and –10 regions
A region within the sigma factor that contains a helix-turn-helix structure then interacts
strongly with the promoter, causing RNA polymerase to “tighten its grip” on the DNA.
The tight binding of the RNA polymerase to the promoter forms what is called the
closed complex
Then, the open complex is formed when RNA polymerase denatures the double-
stranded DNA in the AT-rich Pribnow Box
Next, the RNA polymerase makes a short RNA strand copy of the template strand within
the denatured region
o The sigma factor is released at this point
o This marks the end of initiation
o Note that RNA polymerase, unlike DNA polymerase, is a “smart enzyme”! It can
start an RNA strand all on its own.
The core enzyme now slides down the DNA to synthesize the transcript
The open complex formed by the action of RNA polymerase is about 17 bases long and
remains that size as the polymerase moves along the DNA
o Behind the open complex, the DNA rewinds back into the double helix
Similar to the
synthesis of DNA
via DNA polymerase
It occurs when the short RNA-DNA hybrid of the open complex is forced to separate
o This releases the newly made RNA as well as the RNA polymerase
Many of the basic features of gene transcription are very similar in bacteria and eukaryotes, but is more
complex in eukaryotes. Why?
Specifically, in eukaryotes, transcription is achieved by three different types of RNA polymerase (RNA
pol I-III). These polymerases differ in the number and type of subunits they contain, as well as the class
of RNAs they transcribe; that is, RNA pol I transcribes ribosomal RNAs (rRNAs), RNA pol II transcribes
RNAs that will become messenger RNAs (mRNAs) and also small regulatory RNAs, and RNA pol III
transcribes small RNAs such as transfer RNAs (tRNAs).
Because RNA pol II transcribes protein-encoding genes, it has been of particular importance to scientists
who study the regulation of eukaryotic gene expression, and its function is well understood. For
example, researchers know that RNA pol II can bind to a DNA sequence within the promoter of many
genes, known as the TATA box, to initiate transcription. Together with other common motifs (short
recognition sequences in the DNA), these elements constitute the core promoter. However, changes in
RNA pol II affinity and, therefore, gene expression can be influenced by surrounding DNA sequences
(enhancers), which in turn recruit transcription factors. Also important is a large protein complex called
the mediator, which mediates interactions between RNA pol II and various regulatory transcription
factors. While these properties of transcription regulation are very important, they remain an area of
active research.
Splicing
Analysis of bacterial genes in the 1960s and 1970 revealed the following:
The sequence of DNA in the coding strand corresponds to the sequence of nucleotides
in the mRNA
This in turn corresponds to the sequence of amino acid in the polypeptide
This is termed the colinearity of gene expression
The “cap” consists of a backwards methylated guanine with a triphosphate link to the 5’ nucleotide in
the mRNA. It is added on and is not part of the original transcript. It is bound by cap binding proteins,
which, in turn, are recognized and bound by the ribosome during translation initiation. Thus, the cap
“marks” the RNA as an mRNA and aids in its recognition by the ribosome for translation.
3’ Polyadenylation
Most mature mRNAs have a string of adenine nucleotides at their 3’ ends
This is termed the polyA tail
The polyA tail, like the 5’ cap, is not encoded in the gene sequence
It is added enzymatically after the gene is completely transcribed
when the relationship between genes and proteins was first discovered it was initially
thought that the relationship was one-to-one: one gene coding for one polypeptide
o a gene
information coded in DNA nucleotide sequences
transcribed into mRNA
mRNA translated into a sequence of amino acids joined by peptide
bonds to produce a polypeptide
o polypeptide
polymer of amino acids joined by peptide bonds
each polypeptide’s function dependent on its precise sequence of
amino acids
exceptions:
o some genes produce more than one polypeptide
there are only about 21,000 human genes, but over 120,000 human
proteins
therefore, many genes produce more than one protein
this is possible because of post-transcriptional modification, combining
exons in various combinations
example: lymphocyte production of antibodies:
millions of different antibody proteins are produced from just a
few genes
different lymphocytes splice together parts of these genes in
different ways
o some genes do not code for protein
some genes code for tRNA
not translated into protein
transports amino acids to ribosomes
some genes code for rRNA
not translated into protein
a component of ribosome structure and function
some DNA sequences act as regulators of gene expression
regulatory DNA is transcribed into regulatory RNA
which then binds to other DNA sequences
determining whether those genes are transcribed or not
The mRNA contains sequences that are recognized by the translation machinery (the ribosome). The
start and stop codons are not important during transcription but are crucial signals during translation.
Translation is the final mechanism in Francis Crick’s central dogma of molecular biology, turning nucleic
acid bases (genotype) into protein structures (phenotype).
It is the order of the bases along a single strand that constitutes the genetic code. The four-letter
'alphabet' of A, T, G and C forms 'words' of three letters called codons, similar to the words on a page.
The words when strung together act as the blueprints that tells the cells of the body when and how to
grow, mature and perform various functions. Individual codons code for specific amino acids. A gene is a
sequence of nucleotides along a DNA strand - with 'start' and 'stop' codons and other regulatory
elements - that specifies a sequence of amino acids that are linked together to form a protein.
So, for example, the codon AGC codes for the amino acid serine, and the codon ACC codes for the amino
acid threonine.
It is universal. All life on Earth uses the same code (with a few minor exceptions).
It is degenerate. Each amino acid can be coded for by more than one codon. For example, AGC
and ACC both code for the amino acid serine. This is also known as codon redundancy.
A codon table sets out how the triplet codons code for specific amino acids.
Three 3 bases on the mRNA code for 1 amino acid on the protein. Transfer RNAs (tRNAs) contain
anticodons that are bound to amino acids, according to the genetic code.
polysomes: several to many ribosomes translating the same mRNA into protein; each
moving in the 5’ to 3’ direction
start codon: the mRNA triplet codon AUG is universally the start codon used to mark the
beginning of the coding sequence of a gene; thus, the tRNA with the anticodon UAC and
carrying the amino acid methionine is always the first tRNA to enter the P-site during translation
stop codon: there are three stop codons in the genetic code; none of these have a
corresponding tRNA; instead, when a ribosome encounters a stop codon, a release factor binds
to the stop codon, which terminates translation and allows the separation of all of its
components
Elongation of Translation:
tRNA with anticodon complementary to second mRNA codon binds to ribosomal A site,
with appropriate amino acid attached to tRNA
enzymes in ribosome catalyze formation of peptide bond between 1st, P site, and 2nd, A
site, amino acids
P site tRNA, now separated from amino acid, exits ribosome
ribosome moves one codon (3 nucleotides) along the mRNA, thus shifting previous A-
site tRNA to P-site, and opening A-site
tRNA with anticodon complementary to A-site mRNA codon binds to ribosomal A-site,
with appropriate amino acid attached to tRNA terminal
enzymes in ribosome catalyze formation of peptide bond between 2nd and 3rd amino
acids
P site tRNA, now separated from its amino acid, exits ribosome
ribosome moves one codon (3 nucleotides) along the mRNA, thus shifting previous A-
site tRNA to P-site, and opening A-site
repetition of process until stop codon (see genetic code) is reached.
when ribosomal A-site reaches a stop codon, no tRNA has a complementary anticodon
release factor protein binds to ribosomal A-site stop codon
polypeptide and mRNA are released
large and small ribosomal subunits separate
The integrity of DNA sequences can be compromised during DNA replication when the genetic material
is being duplicated or by agents that cause damage to DNA.
The correct nucleotide has a greater affinity for moving polymerase than the incorrect nucleotide has.
DNA molecules with mismatched 3’ OH ends are not effective templates because
polymerase cannot extend when 3’ OH is not base paired.
DNA polymerase has a separate catalytic site that removes unpaired residues at the
terminus
th
Source: Molecular biology of the cell, 4 Edition
The diagram shows the 2 catalytic sites: P where polymerisation takes place, and E, where editing takes
place
DNA mismatch repair is a system which recognises and repairs erroneous insertion, deletion and mis-
incorporation of bases that can arise during DNA replication. It also repairs some forms of DNA damage.
Removes replication errors which are not recognised by the replication machine
Distinguishes the newly replicated strand from the parental strand by means of
methylation of A residues in GATC in bacteria
In mammals the newly synthesised strand is preferentially nicked and can be distinguished in this
manner from the parental strand.
If there is a defective copy of the mismatch repair gene, then a predisposition to cancer is the end
result.
th
Source: Molecular biology of the cell, 4 Edition
Chemical mutagens
Radiation
Free radicals
UV-B light causes crosslinking between adjacent cytosine and thymine bases creating pyrimidine
dimers. This is called direct DNA damage.
UV-A light creates mostly free radicals. The damage caused by free radicals is called indirect DNA
damage.
Ionizing radiation such as that created by radioactive decay or in cosmic rays causes breaks in DNA
strands.
Thermal disruption at elevated temperature increases the rate of depurination (loss of purine bases
from the DNA backbone) and single strand breaks. For example, hydrolytic depurination is seen in the
thermophilic bacteria, which grow in hot springs at 85–250 °C.[6] The rate of depurination (300 purine
residues per genome per generation) is too high in these species to be repaired by normal repair
machinery, hence a possibility of an adaptive response cannot be ruled out.
Industrial chemicals such as vinyl chloride and hydrogen peroxide, and environmental chemicals such as
polycyclic hydrocarbons found in smoke, soot and tar create a huge diversity of DNA adducts-
ethenobases, oxidized bases, alkylated phosphotriesters and Crosslinking of DNA just to name a few.
The natural ageing process and respiration also causes DNA damage at the rate of around 10000
lesions/cell/day.
The main types of DNA damage that occurs are: base loss and base modification.
Despite the 1000’s of alterations that occur in our DNA each day, very few are actually retained as
mutations and this is due to highly efficient DNA repair mechanisms. This is a very important
mechanism, and this is highlighted by the high number of genes that are devoted to DNA repair. Also, if
there is a inactivation or loss of function of the DNA repair genes, then this results in increased
mutation rates.
DNA damage can activate the expression of whole sets of genes, including:
The SOS response is a post-replication DNA repair system that allows DNA replication to bypass lesions
or errors in the DNA. The SOS uses the RecA protein. The RecA protein, stimulated by single-stranded
DNA, is involved in the inactivation of the LexA repressor thereby inducing the response. It is an error-
prone repair system.
BER is a cellular mechanism that repairs damaged DNA throughout the cell cycle. It is primarily
responsible for removing small, non-helix distorting base lesions from the genome. The related
nucleotide excision repair pathway repairs bulky helix-distorting lesions. BER is important for removing
damaged bases that could otherwise cause mutations by mispairing or lead to breaks in DNA during
replication. BER is initiated by DNA glycosylases, which recognize and remove specific damaged or
inappropriate bases, forming AP sites. These are then cleaved by an AP endonuclease. The resulting
single-strand break can then be processed by either short-patch (where a single nucleotide is replaced)
or long-patch BER (where 2-10 new nucleotides are synthesized).
C. AP endonuclease cuts
phosphodiester backbone
Source: Friedberg, E.C., Walker, G.C. and Siede, W. (1995). DNA Repair and Mutagenesis. American Society for Microbiology,
Washington DC, USA, pp. 91-225.
non-homologous end-joining repair:- the original DNA sequence is altered during repair
(by means of deletions or insertions)
When DNA repair fails, fewer mutations are corrected and this leads to an increase in the number of
mutations in the genome.
In most cases, the protein p53 monitors the repair of damaged DNA, however, if the damage is too
severe, then p53 promotes programmed cell death (apoptosis).
However, mutations in genes which encode the DNA repair proteins can be inherited and this leads to
an overall increase in the number of mutations as errors or damage to the DNA is no longer repaired
efficiently.
As soon as we were able to understand how DNA replicates itself and how it is transcribed, we were
able to use the enzymes involved in these processes to create our own combinations of DNA fragments.
We call this newfound knowledge recombinant DNA technology. In this section you will learn how we
can make and identify recombinant DNA. You will learn about the different ways of visualising
recombinant DNA and the two great revolutions in recombinant DNA technology that have changed the
rate at we are able to generate DNA data, namely the polymerase chain reaction and next generation
sequencing.
Chapter 9, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Section Outcomes.
Recombinant DNA refers to the creation of new combinations of DNA segments that are not found
together in nature. The isolation and manipulation of genes allows for more precise genetic analysis as
well as practical applications in medicine, agriculture, industry and conservation.
Overview: Isolate DNA - Cut with restriction enzymes - Ligate into cloning vector - transform
recombinant DNA molecule into host cell - each transformed cell will divide many, many times to form
a colony of millions of cells, each of which carries the recombinant DNA molecule (DNA clone)
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki,Leontin, Gelbart 1996 by W.
H. Freeman and Company).
A. Isolating DNA
1. Isolation of DNA is accomplished by isolating cells - disrupting lipid membranes with
detergents - destroying proteins with phenol or proteases (eg. Proteinase K)- degrading RNAs with
RNase – precipitating with alcohol to bring the remaining DNA out of solution and then redissolving the
DNA in water.
C. Joining DNA
Once you have isolated and cut the donor and vector DNAs, they must be joined together. The DNAs
are mixed together in a tube. If both have been cut with the same RE, the ends will match up because
they are sticky. DNA ligase is the glue of molecular genetics that holds the ends of the DNAs together.
DNA ligase creates a phosophodiester bond between two DNA ends.
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by W.
H. Freeman and Company).
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by
W. H. Freeman and Company.)
DNA clone = A section of DNA that has been inserted into a vector molecule and then replicated in
a host cell to form many copies.
E. Vectors
1. Requirements for a cloning vector
a) Should be capable of replicating in host cell
b) Should have convenient RE sites for inserting DNA of interest
c) Should have a selectable marker to indicate which host cells received
recombinant DNA molecule
d) Should be small and easy to isolate
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996
by W. H. Freeman and Company).
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996
by W. H. Freeman and Company).
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996 by W.
H. Freeman and Company).
5. Expression vectors are vectors that carry host signals that facilitate the transcription
and translation of an inserted gene. They are very useful for expressing eukaryotic genes in
bacteria.
6. Yeast artificial chromosomes (YACS) are yeast vectors that have been engineered to
contain a centromere, telomere, origin of replication, and a selectable marker. They can carry up
to 1,000 kb of DNA. Since they are maintained in yeast (a eukaryote), they are useful for cloning
eukaryotic genes that contain introns. Also, eukaryotic genes are more easily expressed in a
eukaryotic host such as yeast.
7. Bacterial artificial chromosomes (BACS) are bacterial plasmids derived from the F plasmid.
They are capable of carrying up to 300 kb of DNA.
DNA Libraries are literally a collection of DNA clones that are stored in a certain vector. The goal is
to have each gene represented in the library at least once. This is like a real library: A collection of
books stored in a building, and the goal is to have at least one copy of all books in the library.
DNA libraries are the basis of many important techniques in molecular genetics, including ALL next
generation sequencing applications (we will come to this in a few lecture time). So you will need to
understand what DNA libraries are and how we make them.
There are different types of libraries, depending on what you are interested in.
1. You can have a library named on the basis of the vector – eg. a plasmid library, or a phage
library or a YAC library. See last lecture for what these things are.
2. Or you can categorise the library by the source of the donor DNA.
a. Genomic - made from DNA fragments of total genomic DNA. Genomic is usually cut into
smaller pieces by a restriction enzyme and then the small pieces are inserted into a vector. These
random genomic fragments result in a random shotgun library containing multiple overlapping
clones that cover the complete genome sequence.
b. Chromosome – made from RE DNA fragments of one chromosome isolated via flow
cytometry or pulsed field gel electrophoresis.
We are able to identify DNA of interest to our particular study in several ways. These include:
(From: AN INTRODUCTION TO GENETIC ANALYSIS 6/E BY Griffiths, Miller, Suzuki, Leontin, Gelbart 1996
by W. H. Freeman and Company).
B. Complementation
Clones can be detected based on their ability to confer a
missing function on a mutant.
You know that your gene of interest (gene X) is linked to gene A, for which you have a probe: Using
a library of overlapping RE fragments - Isolate a clone (clone 1) containing A - RE analysis of clone
1- use end of clone 1 as a probe to isolate a new clone (clone 2) - RE analysis of clone 2 - use end of
clone 2 as a probe to isolate a new clone - etc - until you get to gene X
D. Tagging
Use a gene (tag) to which you have a probe to mark your gene of interest by inserting that gene into
your gene of interest
For example, you are interested in cloning genes that are important for iron transport…. Use
transposon (jumping gene) to hop randomly into the chromosome - Screen for those organisms that
are affected in iron transport - cross putative tagged iron transport mutants with tester to verify
that the mutant phenotype segregates with the tag - make library of the DNA from tagged mutant -
select or probe for the tag (and therefore your gene).
In this example, digestion with Enzyme 1 shows that there are two restriction sites for this enzyme,
but does not reveal whether the 3 kb segment is in the middle or on the end of the digested
sequence, which is 17 kb long. Combined digestion by both enzyme 1 and enzyme 2 leaves the 6 and
8 kb segments intact but cleaves the 3 kb segment, showing that enzyme 2 cuts within this enzyme 1
fragment. If the 3 kb section were on the outside of the fragment being studied, digestion by enzyme
2 alone would yield a 1 or 2 kb fragment. Since this is not the case, of the three restriction fragments
produced by enzyme 1, the 3 kb fragment must lie in the middle. That the RE2 site lies closer to the 6
kb section can be inferred from the 7 and 10 kb lengths of the enzyme 2 digestion.
This procedure is now automated so that a computer reads the sequence. Instead of using
radioactive primers, the primers are labeled with different color fluorescent dye for each reaction.
PCR is a rapid in vitro method of creating copies of, or amplifying, a defined target from a source of
DNA. Just as replication in living cells is a semi-conservative process using each strand of the DNA
double helix as a template for a new strand, so too is PCR. However, we must have some prior
knowledge of the DNA sequence of the target so that we can selectively amplify it from the otherwise
heterogeneous collection of DNA molecules present in a typical sample such as total genomic DNA.
For any given target it is necessary to design two short oligonucleotides (each ~20 nucleotides long)
that flank the region of interest. These so-called primers should bind specifically to the
complementary DNA sequences and in the presence of a heat-stable DNA polymerase and the
building blocks of DNA (dNTPs), they initiate the synthesis of new DNA strands.
PCR consists of sequential cycles of synthesis and each cycle can be subdivided into three steps carried
out at different temperatures:
In practice, the
whole cycling
procedure is
performed by a
machine known as a
thermocycler that
can rapidly and
accurately switch
between the desired temperatures.
Reverse transcription (RT) followed by polymerase chain reaction (PCR) represents a powerful tool for
the detection and quantification of mRNA. Real-time RT-PCR is widely used because of its high
sensitivity, good reproducibility, and wide dynamic quantification range. It can determine differences
in gene expression
Since then several other NGS platforms have been developed, such as Illumina, Solid, etc. While each
NGS platform is unique, the overall steps and the underlying concepts are similar.
A casual look at the world around you, at the plants, trees, insects, people, etc, shows you that the
world is made up of an incredible amount of variation. Where did this variation come from? How can
we detect it and measure it? These questions will be answered in this section.
Chapter 13, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Chapter 6, Bergstrom and Dugatkin Evolution (1st Edition)
Section Outcomes.
1. Compare and contrast the various mechanisms that generate genetic variation
2. Describe the different ways in which DNA can mutate.
3. Contrast the effects of these different mutations
4. Explain recombination and how it is advantageous
5. Describe what is a transposon
6. Explain gene conversion
7. Compare the relative advantages and disadvantages of different genetic markers
8. Compare phased with unphased genetic data
9. Describe how genetic variation can be organised
10. Explain how genetic variation can be measured at the individual and population level
A mutation is a change in the nucleotide sequence of the DNA in a cell. There are many different
kinds of mutations. Mutations can occur before, during, and after mitosis and meiosis. If a mutation
occurs in cells that will make gametes by meiosis or during meiosis itself, it can be passed on to
offspring and contribute to genetic variability of the population. Mutations are the sole source of
genetic variability that can occur in asexual reproduction. Mutations are usually harmful or neutral
to offspring but can occasionally be beneficial.
Point mutations
Point mutations create new alleles. They can result from the insertion, deletion, or substitution of
one nucleotide in a gene sequence. There are two causes of point mutations, but both are the result
of reactions catalyzed by DNA polymerase:
a. Uncorrected replication errors
b. Errors in repair of damaged sites
However, when the whole genome is taken into account (i.e., 20-25,000 genes in humans, 10,000 in
Drosophila), with an average mutation rate of 10-5 to 10-6, then 1 in 4 gametes
would carry a phenotypically detectable mutation somewhere in its genome.
The mutation rate is also variable among individuals because of genetic variation in enzymes used
for “proofreading” and repair of DNA
Within an individual, the mutation rate can also vary among genes; this is also poorly understood.
a. mutation rate in coding regions of genes is less than non-coding regions.
b. Repair systems apparently work on transcriptionally active genes only.
c. The explains why slippage mutations that generate microsatellite loci have such high
mutation rates – they are non-coding and are not transcriptionally active.
Mutation rates also vary among types of cells, or organisms. Rates are lower in eukaryotes than
prokaryotes.
Mutations also result from gene rearrangements and other large changes in the DNA sequence of a
chromosome. Chromosomal rearrangements change chromosomal structure and can alter the
function of one or more genes and can change the pattern of gene transmission. A translocation is
movement of a segment of DNA from one place to another in a chromosome or between
chromosomes. An inversion is a mutation in which a segment of DNA has flipped within a
chromosome. A deletion is the loss of a segment of DNA. These large changes are relatively
common, at least over long periods of time, and are abundant in genomes that have been
sequenced.
Deletion
This is a rearrangement that removes a segment of DNA.
Deletions can be located within a chromosome (interstitial) or can remove the end
of a chromosome (terminal). Deletions can be small (intragenic), affecting only one gene, or can
span multiple genes (multigenic). Deletions can arise from DNA damage (X-rays or chemical agents
that break chromosomes). If the deleted region does not contain any genes essential for survival, an
individual homozygous for a deletion (Del/Del) will live. An example is the original white allele in
Drosophila which is a small deletion affecting only the white gene. However, large deletions that
span multiple genes usually result in homozygous lethality because they remove essential genes.
What about individuals heterozygous for a normal chromosome and a deficiency chromosome
(heterozygote Del/+)? In some instances, heterozygotes are viable and fertile. There are at least two
reasons why heterozygosity for a deletion might be detrimental.
(1) Gene dosage problems: a deletion heterozygote will have only half the normal dose of each gene
that is missing in the deletion. In general, humans cannot survive (even as heterozygotes) with
deletions that remove more than ~3% of the genome. An example of a syndrome caused by a
heterozygous deletion is “cri du chat” syndrome, which results from a deletion of all or part of the
short arm of Chromosome 5. The diagnostic phenotypic feature of the syndrome is that the cry of an
affected infant sounds like a high-pitched cat cry, as well as many other features. Although
individuals usually survive, they often don’t live past childhood.
(2) Somatic mutation of the remaining normal copy of an essential gene may lead to defects (often
called "pseudominance"). Individuals with retinoblastoma (malignant eye cancer) are often
heterozygous for deletions on Chromosome 13 in normal tissue; the disease results when a
somatic mutation in the remaining copy of the RB tumor suppressor gene occurs in retinal cells.
Duplication
This is a rearrangement that results in an increase in copy number of a particular chromosomal
region. In tandem duplications, the duplicated regions lie right next to one another, either in the
same order or in reverse order. In non-tandem duplications, the repeated regions lie far apart on the
same chromosome or on different chromosomes. Duplications can occur due to unequal crossing-
over, chromosome breaks and faulty repair, or replication errors. The dominant Bar mutation is a
tandem duplication of the 16A region of the Drosophila X chromosome. Unequal pairing and
crossing-over during meiosis in females homozygous for the Bar tandem duplication occasionally
leads to production of chromosomes bearing three copies of the region (which causes more extreme
double-bar eye phenotype) and chromosomes bearing one copy (conferring normal eyes). Another
A duplication is less likely to affect phenotype than a deletion of comparable size, since the
duplicated genes are still present. However, duplications in heterozygotes can have phenotypic
consequences if gene copy number is important (three copies of each gene in the duplicated region
are now expressed!), or if genes in the duplicated segment are now put into a new chromosomal
environment that alters their expression level or pattern (position effect).
Inversion
A rearrangement in which a chromosomal segment is rotated 180 degrees. Inversions in which the
rotated segment includes the centromere are called pericentric inversions; those in which the
rotated segment is located completely on one chromosomal arm and do not include the centromere
are called paracentric inversions.
Inversions can occur when two double-strand breaks release a chromosomal region that inverts
before religating to flanking DNA, or by intrachromosomal recombination.
Even though gene order is changed in an inversion, many inversions do not cause abnormal
phenotypes. Many inversions can be made homozgous, and inversions can be detected in haploid
organisms. However, if the breakpoint of an inversion is within an essential gene, individuals
homozygous for the inversion will not survive. Unusual phenotypes can also be observed if the
inversion places a gene or group of genes in a new regulatory environment.
In inversion heterozygotes, the observed number of recombinant progeny is reduced. Why?
Because, during meiosis, the homologous chromosomes in inversion heterozygotes form an
inversion loop to maximize pairing. Recombination within the inversion loop leads to abnormal
chromatids (whether the inversion is pericentric or paracentric). Thus, even though crossovers
occur, the abnormal recombinant gametes can rarely give rise to viable progeny upon
fertilization. We see a preponderance of nonrecombinant progeny.
During meiosis in reciprocal translocation heterozygotes, the translocated chromosomes and the
normal homologous chromosomes maximize pairing by forming a cross-like structure among all
four chromosomes (instead of the usual two). A special kind of translocation, called a Robertsonian
translocation is one in which a reciprocal exchange between two acrocentric chromosomes leads to
a large metacentric chromosome and a very small chromosome (that may even carry so few genes
that it does not cause genetic imbalance and is lost). In fact, a less common form of Down's
Syndrome (<5% of cases) is caused by a Robertsonian translocation between Chromosome 21 and
Chromosome 14. This form of Down's Syndrome can recur in families.
When considering the effects of mutations we must make a distinction between the direct effect
that a mutation has on the functioning of a genome and its indirect effect on the phenotype of the
organism in which it occurs. The direct effect is relatively easy to assess because we can use our
understanding of gene structure and expression to predict the impact that a mutation will have on
genome function. The indirect effects are more complex because these relate to the phenotype of
the mutated organism which is often difficult to correlate with the activities of individual genes.
Many mutations result in nucleotide sequence changes that have no effect on the functioning of the
genome. These silent mutations include virtually all of those that occur in intergenic DNA and in the
non-coding components of genes and gene-related sequences. In other words, some 98.5% of the
genome can be mutated without significant effect. These are said to be “neutral” mutations.
Mutations in the coding regions of genes are much more important. First, we will look at point
mutations that change the sequence of a triplet codon. A mutation of this type will have one of four
effects:
1. It may result in a synonymous change, the new codon specifying the same amino
acid as the unmutated codon. This is because of codon redundancy. A synonymous change
is therefore a silent mutation because it has no effect on the coding function of the genome:
the mutated gene codes for exactly the same protein as the unmutated gene.
2. It may result in a non-synonymous change, the mutation altering the codon so that
it specifies a different amino acid. The protein coded by the mutated gene therefore has a
single amino acid change. This often has no significant effect on the biological activity of the
protein because most proteins can tolerate at least a few amino acid changes without
noticeable effect on their ability to function in the cell, but changes to some amino acids,
such as those at the active site of an enzyme, have a greater impact. A non-synonymous
change is also called a missense mutation.
3. The mutation may convert a codon that specifies an amino acid into a termination
codon. This is a nonsense mutation and it results in a shortened protein because translation
of the mRNA stops at this new termination codon rather than proceeding to the correct
termination codon further downstream. The effect of this on protein activity depends on
how much of the polypeptide is lost: usually the effect is drastic and the protein is non-
functional.
4. The mutation could convert a termination codon into one specifying an amino acid,
resulting in read through of the stop signal so the protein is extended by an additional series
of amino acids at its C terminus. Most proteins can tolerate short extensions without an
effect on function, but longer extensions might interfere with folding of the protein and so
result in reduced activity.
Deletion and insertion mutations (indels) also have distinct effects on the coding capabilities of
genes (Figure 14.12). If the number of deleted or inserted nucleotides is three or a multiple of three
then one or more codons are removed or added, the resulting loss or gain of amino acids having
varying effects on the function of the encoded protein. Deletions or insertions of this type are often
inconsequential but will have an impact if, for example, amino acids involved in an enzyme's active
site are lost, or if an insertion disrupts an important secondary structure in the protein. Replication
slippage is responsible for the trinucleotide repeat expansion diseases that have been discovered in
humans in recent years. Each of these neurodegenerative diseases is caused by a relatively short
series of trinucleotide repeats becoming elongated to two or more times its normal length. For
example, the human HD gene contains the sequence 5′-CAG-3′ repeated between 6 and 35 times in
tandem, coding for a series of glutamines in the protein product. In Huntington's disease this repeat
expands to a copy number of 36–121, increasing the length of the polyglutamine tract and resulting
in a dysfunctional protein.
On the other hand, if the number of deleted or inserted nucleotides is not three or a multiple of
three then a frameshift results, all of the codons downstream of the mutation being taken from a
different reading frame from that used in the unmutated gene. This usually has a significant effect
on the protein function, because a greater or lesser part of the mutated polypeptide has a
completely different sequence to the normal polypeptide.
It is less easy to make generalizations about the effects of mutations that occur outside of the coding
regions of a gene. In DNA-protein interactions, any protein binding site is susceptible to point,
insertion or deletion mutations that change the identity or relative positioning of nucleotides
involved. These mutations therefore have the potential to inactivate promoters or regulatory
sequences, with predictable consequences for gene expression. Origins of replication could
conceivably be made non-functional by mutations that change, delete or disrupt sequences
recognized by the relevant binding proteins but these possibilities are not well documented.
Now we turn to the indirect effects that mutations have on organisms, beginning with multicellular
diploid eukaryotes such as humans. The first issue to consider is the relative importance of the same
mutation in a somatic cell compared with a germ cell. Because somatic cells do not pass copies of
their genomes to the next generation, a somatic cell mutation is important only for the organism in
which it occurs: it has no potential evolutionary impact. In fact, most somatic cell mutations have
no significant effect, even if they result in cell death, because there are many other identical cells in
the same tissue and the loss of one cell is immaterial. An exception is when a mutation causes a
somatic cell to malfunction in a way that is harmful to the organism, for instance by inducing tumor
formation or other cancerous activity.
Mutations in germ cells are more important because they can be transmitted to members of the
next generation and will then be present in all the cells of any individual who inherits the mutation.
Most mutations, including all silent ones and many in coding regions, will still not change the
phenotype of the organism in any significant way. Those that do have an effect can be divided into
two categories:
Assessing the effects of mutations on the phenotypes of multicellular organisms can be difficult. Not
all mutations have an immediate impact: some are delayed onset and only confer an altered
phenotype later in the individual's life. Others display non-penetrance in some individuals, never
being expressed even though the individual has a dominant mutation or is a homozygous recessive.
With humans, these factors complicate attempts to map disease-causing mutations by pedigree
analysis (Section 5.2.4) because they introduce uncertainty about which members of a pedigree
carry a mutant allele. The effects of a mutation will also depend on environmental conditions. e.g.:
sickle-cell anaemia.
Homologous recombination
This is the most important version of recombination in nature, being responsible for meiotic
crossover during meiosis in eukaryotic cells and the integration of transferred DNA into bacterial
(prokaryotic) genomes after conjugation, transduction or transformation.
Linkage
Tracking the movement of genes resulting from crossovers has proven quite useful to geneticists.
Because two genes that are close together (linked) are less likely to become separated than genes
that are farther apart, geneticists can deduce roughly how far apart two genes are (or how tightly
linked they are) on a chromosome if they know the frequency of the crossovers. Geneticists can also
use this method to infer the presence of certain genes. Linked genes therefore typically stay
together during recombination. One gene in a linked pair can sometimes be used as a marker to
deduce the presence of another gene. This is typically used in order to detect the presence of a
disease-causing gene.
Gene duplication
Sometimes crossing over is unequal. One chromosome gets a longer piece of its homologue than the
other chromosome gets in return. This can result in gene duplication in the chromosome that got
more DNA. Gene duplication can give rise to new genes because the extra gene can sustain
mutations while the duplicate gene continues to carry out its normal function. Analyses of the
genomes of many organisms suggest that genes are often duplicated over evolutionary time. The
groups of duplicated genes are referred to as "gene families", owing to the resemblance of their
sequences and their origin by descent from a common ancestor gene.
Gene duplication
1. Gene duplications are either short or long segments of extra chromosome material
originating from duplicated sequences within a genome.
2. This occurs during crossing over, but what
happens is unequal cross-over.
a. Deletion of a substantial amount of genetic
material results in inviable gametes or
zygotes.
b. Duplication of genetic material on the other
chromosome may be advantageous. The
duplicate material is “free” to mutate without
fitness consequences.
3. Gene family - genes that have similar DNA
sequences, but differ to some degree in
sequence and often in function.
4. The globin gene family = two clusters of loci
coding for component subunits of
hemoglobin
The shuffling of genes brought about by genetic recombination produces increased genetic
variation. It also allows sexually reproducing organisms to avoid Muller's ratchet, in which
the genomes of an asexual population accumulate genetic deletions in an irreversible manner.
Independent assortment
Mutations occur during DNA replication prior to meiosis. Crossing over during meiosis mixes alleles
from different homologues into new combinations. However a further source of variation comes
from the fact that each chromosome, assorts independently from others. When meiosis is complete,
the resulting eggs or sperm have a mixture of maternal and paternal chromosomes. This is because
during anaphase I, the spindle accurately separates a complete set of 23 human chromosomes into
each daughter cell but does not distinguish between the 23 from Mom and the 23 from Dad. Mom's
and Dad's homologues are randomly intermixed during anaphase such that each egg or sperm cell
has a nearly unique combination of Mom's and Dad's alleles. The number of combinations of 23
maternal and paternal homologues that can result from independent assortment is 223, about 8
million. This does not include the variation caused by mutations or crossing over!
Fertilization
Fertilization randomly brings together two gametes produced in two different individuals. This
means that for a particular man and woman, the number of unique combinations of genes that
could occur in their offspring is 8 million times 8 million (64 trillion), not counting variation caused by
crossing over and mutation. Random fertilization is a further mechanism that produces genetic
variation in the process of sexual reproduction.
The genetic variation that results from mutations, meiosis, and fertilization cause the phenomenon
with which we are all familiar: even in very large populations, such as the human population, every
individual is genetically unique.
Advantages of Polyploidy. Polyploidy probably has some advantages in both plants and animals.
a. Extra chromosomes may act as multiple buffers in various organismic processes.
b. Additional chromosomes may provide the chance to evolve new functions.
c. Duplication of genome offers same possibilities for mutations to produce novel traits
as does gene duplication (it’s just on a larger scale) Rate of
Integrated transposable elements are flanked by short direct repeat sequences. This particular transposon is
flanked by the tetranucleotide repeat 5′-CTGG-3′. Other transposons have different direct repeat sequences.
The various types of transposable element known in eukaryotes and prokaryotes can be broadly
divided into three categories on the basis of their transposition mechanism.
DNA transposons that transpose replicatively, the original transposon remaining in
place and a new copy appearing elsewhere in the genome;
DNA transposons that transpose conservatively, the original transposon moving to a
new site by a cut-and-paste process;
Retro-elements, all of which transpose via an RNA intermediate. eg. retroviruses,
which include the human immunodeficiency viruses that cause AIDS and various other
virulent types.
Gene conversion. One gamete contains allele A and the other contains allele a. These fuse to
produce a zygote that gives rise to four haploid spores, all contained in a single ascus. Normally, two
of the spores will have allele A and two will have allele (more...)
Geneticists rely on markers to detect genetic variation. Markers are simply parts of an organism,
whether it is a physical attribute, a protein or enzyme or a piece of DNA that can be used to compare
individuals to each other. The most important attribute of a marker is that it is polymorphic – that is,
it must have more than one variant or allele. There is no point in examining a marker if it is not
polymorphic – all the individuals will have the same allele and there will be almost nothing that we
can infer. Here we will go through the main markers that have been used by biologists in historical
order.
Morphometrics
Although this is the quantitative analysis of form, a concept that encompasses size and shape, form
is genetically encoded. Morphometric analyses are commonly performed on organisms, and can be
used to quantify a trait of evolutionary significance, and by detecting changes in the shape, deduce
something of their ontogeny, function or evolutionary relationships. A major objective of
morphometrics is to statistically test hypotheses about the factors that affect shape.
Methods:
1. Traditional morphometrics analyzes
lengths, widths, masses, angles, ratios
and areas.
2. In landmark-based geometric
morphometrics, the spatial information
missing from traditional morphometrics
is contained in the data, because the
data are coordinates of landmarks
Advantages
Cheap
Allows comparisons between organisms
Disadvantages
Few characters
Homology in characters is assumed but not known.
Unphased data
DNA-DNA hybridisation
An early and crude method for determining how closely related two organisms were.
Method: The DNA of one organism is labeled, then mixed with the unlabeled DNA to be compared
against. The mixture is incubated to allow DNA strands to dissociate and reanneal, forming hybrid
double-stranded DNA. Hybridized sequences with a high degree of similarity will bind more firmly,
and require more energy to separate them: i.e. they separate when heated at a higher temperature
than dissimilar sequences, a process known as "DNA melting".
Isozymes
Different molecular forms of an enzyme (protein) that catalyze the same reaction. This was the very
first time that biologists could actually see allelic polymorphisms (that is, whether an individual is
homozygous or heterozygous) directly.
Advantages of RFLPs:
o Higher level of polymorphism than
isozymes (mitochondrial, nuclear,
chloroplast)
o Larger number of loci
o Selective neutrality
Limitations of RFLP
o Slow and expensive
o Polymorphism is still limited
o Unphased data
Basis: Detection of differences in patterns of DNA amplification from short primers of arbitrary
sequence
Method: RAPD PCR
Denaturation of DNA and annealing of primers
Primer extension
Repeat cycling for 20 x
Electrophorese PCR products
Stain and score
Advantages of RAPDs:
More polymorphic than RFLPs
Simple and quick
Selective neutrality
Limitations of RAPDs
Reproducibility among labs may be a problem
Loci may not be directly comparable
Unphased data
Advantages
highly sensitive
highly reproducible (repeatable)
selective neutrality
Disadvantages:
technically demanding
expensive
unphased data
Basis:
variable number of short tandemly repeated sequences
Method:
sequence SSR and adjacent DNA
design PCR primers and conduct a PCR amplification of DNA
separate
amplified DNA by
electrophoresis
Advantages:
high level of polymorphism
easy and fast to run
robust and reproducible
Phased data - Allelic variation
Disadvantage:
Time-consuming and
expensive to develop
Method:
DNA extraction
PCR
DNA sequencing
Advantages
highly reproducible
very informative
Disadvantages
expensive
knowledge of sequence is required for primer design
Amount of polymorphism depend on mutation rate
Unphased data, but can use cloning to determine phase
In the past decade, the development of high-throughput methods for genomic sequencing (next-
generation sequencing: NGS) have revolutionized how many geneticists collect data. It is now
possible to produce so much data so rapidly that simply storing and processing the data poses great
challenges [10].
To some extent the most important opportunity provided by NGS sequencing is simply that we now
have a lot more data to answer the same questions. For example, using a technique like RAD
sequencing [1] or genotyping-by-sequencing (GBS: [2]), it is now possible to identify thousands of
polymorphic SNP markers in non-model organisms, even if you don’t have a reference genome
available.
Method: Digest genomic DNA from each individual with a restriction enzyme, and ligate an adapter
to the resulting fragments. The adapter includes a forward amplification primer, a sequencing
primer and a “barcode” used to identify the individual from which the DNA was extracted. Pool the
individually barcoded samples (“normalizing” the mixture so that roughly equal amounts of DNA
from each individual are present) shear them and select those of a size appropriate for the
sequencing platform you are using. Ligate a second adapter to the sample, where the second
adapter is the reverse complement of the reverse amplification primer. PCR amplification will enrich
only DNA fragments having both the forward and reverse amplification primer. The resulting library
consists of sequences within a relatively small distance from restriction sites.
Advantages
1. Can be used in any organism
from which you can extract
DNA
2. Laboratory manipulations are
relatively straightforward.
3. Huge amount of data
Disadvantages
1. Polymorphisms may not be
selectively neutral
2. The number of loci in
common to all animals in the
population will reduce with
number
Method: Digest genomic DNA with a restriction enzyme and ligate two adapters to the genomic
fragments. One adapter contains a barcode and the other does not. Pool the samples. PCR amplify
and sequence. Not all ligated fragments will be sequenced because some will contain only one
adapter and some fragments will be too long for the NGS platform.
Advantages
1. Can be used in any organism from which you can extract DNA
2. Laboratory manipulations are relatively straightforward.
3. Huge amount of data
Disadvantages
1. Polymorphisms may not be selectively neutral
2. The number of loci recovered is less than in RadSeq.
This is a laboratory process that determines the complete DNA sequence of an organism's genome
at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA
contained in the mitochondria and, for plants, in the chloroplast. During whole genome sequencing,
researchers collect a DNA sample and then determine the identity of the billions of nucleotides that
compose the genome. The very first human genome was completed in 2003 as part of the Human
Genome Project, which was formally started in 1990. Today, sequencing technology is much more
efficient, and a human genome can be sequenced in a matter of days for under $10,000. The first
human genome cost $2.7 billion.
Advantages
The whole genome of the organism is available for analysis. This is much more than the
reduced representation methods previously used.
Disadvantages
High error rate, requires high coverage for accuracy
Short sequence reads (70-90 bp)
Difficult assembly
In order to measure genetic variation, we first need to figure out what measurement is most
appropriate. This will depend on how the genetic variation is organised. This can be in several ways:
1. It can be organised based on the kind of genetic marker you are dealing with.
A) In eukaryotes for example, some markers can distinguish between the two copies (alleles) of a
locus that you inherited from each of your parents. These are diploid markers and examples include
microsatellite loci, allozymes, RadSeq, GBS and whole genome sequences.
2. Genetic variation can also be measured as the number of copies present of a particular locus or
gene. Even in Eukaryotes, many loci are present in more than two copies (mum and dad). Why? This
is because at some point in the past, a gene duplication or gene conversion event occurred and
resulted in multiple copies of the same gene. It can be very useful to know how many copies an
individual has of a particular gene. This is called variation in copy number (VCN).
Telomeres are a classic example. They are sequences of repeated DNA elements that protect the
ends of the chromosome from damage. They are like a cap for the ends of the chromosomes.
However, as we age, these caps get smaller and smaller. So if you look at the VCN in the amount of
telomere DNA in an individual at two different points in time, you can figure out how much they
have aged.
A) Individual
measures of genetic
variation
1. Number of alleles
or allele copies (copy
number)
2. Individual heterozygosity. Count the number of loci at which the individual is heterozygous, then
divide by the total number of loci.
Most commonly genetic variation within a population is measured as the percentage of gene loci
that are polymorphic or the percentage of gene loci in individuals that are heterozygous.
1. The frequencies of an allele at loci are calculated manually by direct counting. The mean number
of alleles (MNA) observed over a range of loci for different populations is considered to be a
reasonable indicator of genetic variation.
2. Allelic richness (Ar). A measure of the number of alleles per locus but allows comparisons to be
made between samples of different sizes.
4. Expected heterozygosity. We can also work out the heterozygosity we would expect given a set of
allele frequencies. If we sum the squares of the allele frequency of each allele at a locus and subtract
it from one. He= 1 – sum(allele frequencies for each allele2)
5. Number of polymorphic sites. This is the number of nucleotide sites that are different in a
population of DNA sequences.
6. Nucleotide diversity. This is the average number of nucleotide substitutions per length of DNA
fragment analysed.
7. Haplotype diversity (or gene diversity). This is a measure of the uniqueness of a particular
haplotype in a given population.
Mutation: occurs randomly. This is what generates the genetic variation upon which the other
evolutionary forces (below) act. See also previous lectures on mutation. Mutations generate
different alleles and the frequencies of these alleles in a population are determined by the three
forces listed below.
Selection: This is the external pressure of the environment that determines which mutations (alleles)
survive to the next generation and which do not. Selection changes the frequency of these alleles
with time. High selection will change allele frequencies faster than low selection pressure.
Genetic Drift: This is a random process in which alleles are lost from a population due to pure
chance. The effect of genetic drift on allele frequencies depends on the size of the population and
the level of migration into that population. Genetic drift can change the frequency of alleles in a
population quickly if it is small and very quickly if it is small AND isolated (no migration to bring in
new alleles). On the other hand the effect of drift is small if the population is large and very small if
the population is large and there is migration from other populations.
Migration: Immigration into a population and emigration out of a population will also change the
allele frequencies of the population, especially if the migrants are breeding in that population.
Migration interacts with both selection and drift.
Chapter 15, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Chapters 7 and 8, Bergstrom and Dugatkin Evolution (1st Edition)
1. Explain Lamarkism
2. Differentiate between the central tenets of natural selection and evaluate the evidence for
natural selection.
3. Compare the effects of stabilising, directional and diversifying selection on population allele
frequencies
4. Evaluate the contribution of Mendel, the Modern Synthesis and Neutral Theory to the study
of evolution.
5. Explain the four evolutionary forces and how they can drive evolution through changing
allele frequencies in populations.
6. Predict the outcome of crosses including the use of the Punnett square.
7. Explain the cases in which inheritance deviates from Mendelian expectation
8. Explain how sexual selection can change population allele frequencies.
9. Analyze a population using the Hardy-Weinberg calculations. Apply chi square analysis to
those predictions
10. Explain the interaction between selection, genetic drift and migration
11. Explain Dobzhansky’s idea of how changes in gene frequencies could lead to species.
12. Evaluate the neutral theory based on its predictions.
Darwin was not the first person to try and make sense of how the diversity of biological life came to
be. He was preceded by a number of great thinkers. It is very important to realise that the ideas (also
known as hypotheses) put forward by these thinkers may seem obviously wrong by today’s
standards, yet without these ideas, there would be no way to move forward scientifically. Science
and progress, therefore, depend on new ideas, no matter whether they stand the test of time or not.
Evolutionary biology, and science in general, has also undergone its own “evolution”. The major
developments prior to Darwin were:
“we must not accept a general principle from logic only, but must prove its
application to each fact; for it is in facts that we must seek general principles, and
these must always accord with the facts”
3. The realisation that the world was ever-changing. The Greeks also realised that the
world had changed when they discovered fossilised marine invertebrates embedded in
mountainous rocks, and concluded that these rocks much at one time been at the bottom of
the ocean.
4. The world is very very old. The ancient nature of the Earth only really became fully
appreciated n the late 18th/early 19th century when geologists like James Hutton and Charles
Lyell looked at everyday natural processes, such as soil erosion, and calculated the amount
of time need for these processes to erode large geological features like canyons. Darwin
drew heavily on the work of Lyell, especially the idea that great features can be brought
about through slow and gradual changes, acting over the course of a very long time.
5. The development of Natural history to take stock of the diversity of life. Aristotle
described hundreds of species of animals and developed a taxonomy of nature—a
classification system of life— would later be called “the great chain of being,” or scala
naturae. But people still thought of life as being spontaneously generated, and not evolved
through common ancestry. Erasmus Darwin (Charles Darwin’s grandfather) put forward the
idea of common ancestry stating that all life was decended from a “single living filament”.
Robert Chambers outlined his principle of progressive development, in which he
hypothesized that new species arise from old species.
While the advances above were monumental, they still did not fully explain how in the diversity of
life, organisms have come to be perfectly suited to their environment. The theologian William Paley
attempted to describe the complexity of life as a watch, with all its complex springs and cogs,
perfectly suited for telling the time. He concluded that such complexity could not have come
together simply by chance, and could only have been designed by a watchmaker, in other words, the
complexity of life and its perfect match to its environment could only have been created by god.
On the other hand, Lamarck used nature to describe his theory of how species change or evolve to
perfectly suite their environments. He postulated that phenotypic characteristics acquired during
the lifetime of an organism can be inherited by its
offspring. The famous example of Lamarck’s theory is that
of the short-necked giraffe. At one point in evolutionary
time, all giraffes had short necks, then to reach the best
leave at the tops of the trees, the giraffes had to stretch
their necks. Those giraffes that stretched their necks the
furthest were the most successful and passed on their
characteristics to the next generation, and after several
generations, all giraffes had long necks. This is Lamarck’s
theory of inheritance of acquired characteristics, also
known as Lamarckism.
Lamarck’s idea was reasonable, he used a natural explanation and did not need to invoke god. But
we know now that acquired characteristics cannot be not inherited, and we now ground our ideas of
how traits are passed from generation to generation in the laws of genetics, which you have learned
in previous sections. These laws, however, were formulated about 100 years after Lamarck.
Lamarck’s legacy, however, is not that he postulated the wrong processes for evolutionary change,
but that he proposed a process in the first place, and that he connected it to environmental fit.
Lamarck is therefore rightfully credited with being the first “Evolutionist”.
Evolution via natural selection can occur only if there is variation in the population to begin with.
From one generation to the next, the struggle for resources (what Darwin called the “struggle for
existence”) will favour individuals with some variations over others and thereby change the
frequency of traits within the population. This process is natural selection. The traits that confer an
advantage to those individuals who leave more offspring are called adaptations.
In order for natural selection to operate on a trait, the trait must possess heritable variation and
must confer an advantage in the competition for resources. If one of these requirements does not
occur, then the trait does not experience natural selection. (We now know that such traits may
change by other evolutionary mechanisms that have been discovered since Darwin’s time.)
NOTE THAT THIS DOES NOT MEAN THAT EVOLUTION HAS A "GOAL" OR THAT THERE IS A "MOST
HIGHLY EVOLVED SPECIES. AND IT IS NOT AN INEXORABLE MARCH TO AN "IDEAL PINNACLE SPECIES"
(which is most often defined as Homo sapiens by people who haven't a clue about biological
realities...). To use William Paley’s analogy – the watchmaker is “blind” – he has no sense of
direction. See Richard Dawkins’, The Blind Watchmaker.
Fitness can be defined either with respect to a genotype or to a phenotype in a given environment.
In either case, it describes individual reproductive success and is equal to the average contribution
to the gene pool of the next generation that is made by an average individual of the specified
genotype or phenotype. The term "Darwinian fitness" can be used to make clear the distinction
with physical fitness.
Industrial melanism is a phenomenon that affected over 70 species of moths in England. It has been
best studied in the peppered moth, Biston betularia. Prior to 1800, the typical moth of the species
had a light pattern (see Figure 2). Dark colored or melanic moths were rare and were therefore
collectors' items.
In the 1950's, the biologist Kettlewell did release-recapture experiments using both morphs. A brief
summary of his results are shown below. By observing bird predation from blinds, he could confirm
that conspicuousness of moth greatly influenced the chance it would be eaten.
Galapagos finches are the famous example from Darwin's voyage. Each island of the Galapagos that
Darwin visited had its own kind of finch (14 in all), found nowhere else in the world. Some had beaks
adapted for eating large seeds, others for small seeds, some had parrot-like beaks for feeding on
buds and fruits, and some had slender beaks for feeding on small insects. One used a thorn to probe
for insect larvae in wood, like some woodpeckers do. (Six were ground-dwellers, and eight were tree
finches.) (This diversification into different ecological roles, or niches, is thought to be necessary to
permit the coexistence of multiple species, a topic we will examined in a later lecture.) To Darwin, it
appeared that each was slightly modified from an original colonist, probably the finch on the
mainland of South America, some 600 miles to the east.
We can look at selection in a statistical way. Suppose that each population can be portrayed as a
frequency distribution for some trait -- beak size, for instance. Note again that variation in a trait is
the critical raw material for evolution to occur.
Figures a-c
What will the frequency distribution look like in the next generation?
The blue colour shows us at which part of the frequency distribution selection is operating. First (a),
selection acts against extremes of the frequency distribution (individuals that are too small or too
large). In time, the frequency distribution tends to shift towards the median. This is stabilising
selection, probably the most common form of natural selection, and we often mistake it for no
selection. A real-life example is that of birth weight of
human babies.
Long before Darwin and Wallace, farmers and breeders were using the idea of selection to cause
major changes in the features of their plants and animals over the course of decades. Farmers and
breeders allowed only the plants and animals with desirable characteristics to reproduce, causing
the evolution of farm stock.
Many species have been modified by breeders, e.g., cattle, sheep, dogs, flowers, vegetables. How is
is done: The offspring of each generation vary. The differences may be so small that only trained
breeder can detect them. Those that are more like what the breeder wants are selected for further
breeding; the rest aren’t allowed to breed. This is repeated. Eventually the small differences add up
to a large change in the breed. Darwin calls this process artificial selection. This works in exactly the
same way as natural selection, except in this case, the environment does not select which individuals
survive and breed to the next generation, instead humans do the selection. An individual’s fitness is
thus determined by the selector. This can create striking differences in a short space of time.
As shown below, farmers have cultivated numerous popular crops from the wild mustard, by
artificially selecting for certain attributes.
Sexual selection is a "special case" of natural selection. Sexual selection acts on an organism's ability
to obtain (often by any means necessary!) or successfully copulate with a mate. In order to leave an
evolutionary legacy survival is not enough. Individuals must also reproduce. Over 90% of species
reproduce sexually, meaning two individuals from each sex must mate in order to produce offspring.
Reproduction is expensive and can exert an additional evolutionary pressure. Darwin defined this
pressure as sexual selection. Sexual selection operates through some members of a species having
an advantage over others in terms of mating. It is the selection for traits that are solely concerned
with increasing the mating success of an individual.
Darwin’s findings in relation to sexual selection were published in his book The Descent of Man and
Selection in Relation to Sex in 1871. Darwin observed that there are some characteristics that do not
appear to help an organism adapt to its environment and are thus not explained by natural
selection. He suggested that they feature in the process of sexual selection. He defined the process
by saying that it ‘depends on the advantage which certain individuals have over other individuals of
the same sex and species, in exclusive relation to reproduction’. His observations and analysis lead
to the reasoning that sexual selection works in two main ways: either through competition among
members of one sex for access to members of the other sex (combat) or through choice by members
of one sex for certain members of the other sex (mate choice).
Combat
In terms of combat, males within a species compete with each other for access to the females. This
leads to larger and stronger males and to the development of male ‘weapons’ in order to give them
the advantage when in combat with other males. Darwin referred to many examples of this.
Elephant seals and walruses are examples of the increased size and strength of males. Elephant seals
annually migrate from their foraging ground to their breeding ground. The males arrive roughly two
weeks before the females and they then fight to gain the best breeding site and thus attract the
most females. Only the largest and strongest males are able to dominate in this competition.
Roughly 90% of males end up
pup-less! Examples of male
‘weapons’ include the horns of
male beetles, the antlers of
stags, the large canine teeth of
male baboons and the tusks of
male wild boar. In addition, male
competition can also be more
subtle. For example, when some
male insects mate with a female
they remove the sperm that is
already present in the female as
it is from previous males.
‘Display’ refers to the exhibition of ornate male features to potential female mates, such as the
striking, brightly coloured plumage of many male birds. Darwin suggested that this process of
selection operates through female choice, whereby females choose the most striking males to mate
with. This theory proposes that male ornaments are thought to be a genuine indication of the male’s
vitality. The presence of a costly ornament on a male tells the female that he is genetically
exceptionally healthy and thus her offspring will inherit his vitality.
A classic example, and one used by Darwin, is the flamboyant tail of the peacock. It appears to serve
no purpose in terms of survival and may actually be a handicap - a disadvantage in terms of survival
for existence - as the tail makes it more difficult for the bird to escape from predators. The
explanation in terms of sexual selection for its presence is that peahens are attracted to it, and there
are various suggestions as to why they like it. Generally sexually selected traits either signal fitness
directly, as their development depends on good health or a good diet, or indirectly when they signal
the ability of the male to survive despite the large cost imposed by the fancy plumage.
Darwin proposed that this process of sexual selection would work in the following way. In the past
when peacocks had ordinary colour
and length tails, peahens showed a
preference to mate with males
with slightly longer and more
flamboyant than average tails.
Thus, the characteristic of slightly
longer, more brightly coloured tails
would be passed on to the next
generation and over many
generations the peacocks' tails
would become longer and brighter.
Thus, the ornate tail gives such an
advantage in terms of mating success that it is selected for despite being a disadvantage in terms of
general survival. Darwin thus argued that these flamboyant male characteristics were not, as
believed at the time, due to a designer who had an aesthetic sense, but due to the need to attract a
mate. Other examples where males are more striking than the females are found in fish, lizards and
spiders.
Darwin’s main evidence for sexual selection was the fact that he found, from a comparison of a great
number of species, that there is a greater difference between the sexes (greater sexual dimorphism)
in polygynous species than monogamous species. A polygynous species is a species where one male
mates with several females, whereas in monogamous species a single male pairs with a single
female. Darwin reasoned that in polygynous species secondary sexual characters will be more
developed to enable the males to gain access to more females (via combat or display). Therefore,
polygynous species should, and were indeed found to, have greater sexual dimorphism.
Darwin struggled with the problem of inheritance. He knew that traits had to be heritable, that is,
passed on from parent to offspring, otherwise his theory of natural selection would not have
worked. However, he did not know how organisms passed traits on to their offspring. Why did some
traits seem to be passed on and others not? How did the traits of the parents work together in the
offspring - did they compete, or combine?
“Hypotheses may often be of service to science, when they involve a certain portion of
incompleteness, and even error”.’
Gregor Johann Mendel was born in the Austrian Empire in a town that is today in the Czech
Republic. He studied physics and maths at University, but after graduating he began studying to be a
monk, joining the Augustinian order at the St. Thomas Monastery in Brno. In those days the
monastery was a cultural centre, and Mendel was immediately exposed to the research and teaching
of its members, and also gained access to the monastery’s extensive library and experimental
facilities. In 1851, he was sent to the University of Vienna, at the monastery’s expense, to continue
his studies in the sciences. While there, Mendel studied mathematics and physics under Christian
Doppler and botany under Franz Unger. In 1853, upon completing his studies in Vienna, Mendel
returned to the monastery in Brno and was given a teaching position at a secondary school, where
he began the experiments for which he is best known.
Mendel read Darwin with deep interest, but he disagreed with the blending notion - how could a
single fortunate mutation be spread through a species? It would be blended out, just as a single drop
of white paint would be in a gallon of black paint. Instead, Mendel hypothesized that traits, such as
eye colour or height or flower hues, were carried by tiny particles that were inherited whole in the
next generation. This was the birth of particulate inheritance, and these tiny particles eventually
came to be known as genes.
Mendel chose to use peas for his experiments due to their many distinct varieties, and because
offspring could be quickly and easily produced. He cross-fertilized closely related pea plants that had
a small number of clearly opposite characteristics—tall with short, smooth with wrinkled, those
containing green seeds with those containing yellow seeds, etc. The results of Mendel's carefully
designed and meticulously executed experiments, which involved nearly 30,000 pea plants followed
over eight generations.
The seven traits of pea plants that Mendel chose to study: seed wrinkles; seed color; seed-coat color, which
leads to flower color; pod shape; pod color; flower location; and plant height. Image by Mariana Ruiz.
When Mendel bred purple-flowered peas (BB) with white-flowered peas (bb), every plant in the next
generation had only purple flowers (Bb). When these purple-flowered plants (Bb) were bred with
one-another to create a second-generation of plants, some white flowered plants appeared again
(bb). Mendel realized that his purple-flowered plants still held instructions for making white flowers
somewhere inside them. He also found that the number of purple to white was predictable. 75
So an adaptive mutation could spread slowly through a species and never be blended out. Darwin's
theory of natural selection, building on small mutations, could work. But no one at the time
understood the implications of Mendel's experiments. He soon left biology to focus on running his
monastery. He died in 1884 having made little impact on the scientific world. It was not until 1900
that Mendel's work was rediscovered, by Carl Correns in Germany, Hugo de Vries in the Netherlands
and Erich von Tschermak-Seysenegg in Austria. Only then did Mendel -- who had worked without a
microscope, without computers, but with a thoughtful hypothesis, a carefully designed experiment,
and enormous patience -- receive the credit for one of the great discoveries in the history of science.
Important terms
hybrid
F1
F2
phenotype
genotype
gene
alleles
homozygous
heterozygous
test cross
Dihybrid cross
A. Cross
WG Wg wG wg
WG
Wg
wG
wg
round, yellow?
eg. 3: probability that offspring of heterozygous father will inherit H, the dominant
mutation responsible for Huntington’s chorea.
1. limits of probability
a. if event is certain:
eg. 4: Probability that 4th child will be girl if the first 3 children were girls?
Lethal Alleles „
Essential genes are those that are absolutely required for survival. The absence of their protein
product leads to a lethal phenotype. It is estimated that about 1/3 of all genes are essential for
survival.
A lethal allele is one that has the potential to cause the death of an organism. These alleles are
typically the result of mutations in essential genes. They are usually inherited in a recessive manner.
Many lethal alleles prevent cell division, these will kill an organism at an early age.
Conditional lethal alleles may kill an organism only when certain environmental conditions prevail
Semilethal alleles „
Traits that appear to be determined by systems of complete dominance at the gross phenotypic
level may be cases of incomplete dominance at the biochemical level.
Codominance
Sometimes, traits associated with both alleles are observable in a heterozygote. An example in cattle
is the roan coat color (mixed red and white hairs) occurs in the heterozygous (Rr) offspring of red
(RR) and white (rr) homozygotes.
• A) roan x roan
• B) red x white
• C) white x roan
• D) red x roan
• E) All of the above crosses would give the same percentage of roan.
Overdominance
Overdominance is the phenomenon in which a heterozygote is more vigorous than both of the
corresponding homozygotes. It is also called heterozygote
advantage or heterosis.
„ Two alleles
HbSHbS individuals have red blood cells that deform into a sickle shape under conditions of low
oxygen tension. This has two major ramifications „
2. Odd-shaped cells clump. Partial or complete blocks in capillary circulation. Thus, affected
individuals tend to have a shorter life span than unaffected ones.
The sickle cell allele has been found at a fairly high frequency in parts of Africa where malaria is
found
Malaria is caused by a protozoan, Plasmodium. This parasite undergoes its life cycle in two main
parts. One inside the Anopheles mosquito. The other inside red blood cells. Red blood cells of
heterozygotes, are likely to rupture when infected by Pasmodium sp. This prevents the propagation
of the parasite.
At the molecular level, overdominance is due to two alleles that produce slightly different proteins
But how can these two protein variants produce a favorable phenotype in the heterozygote?
Well, there are three possible explanations for overdominance at the molecular/cellular level
„ 1. Disease resistance
„ 2. Homodimer formation
Pleiotropy
Polygeny
Organelle DNA
So now we have learned about Darwin’s natural selection, and Mendel’s inheritance. Initially, these
new theories prompted disagreements in the scientific community, and they were first believed to
be contrary to each other. That is because the drastic changes of Mendelian genetics, from round to
wrinkled peas with no intermediate seems to clash with Darwins idea of gradual change over long
periods.
But ultimately these ideas were understood to be complementary: Genes are the means through
which information is inherited, and they are passed on through the germ cells. If an individual’s
genes give it an advantage, it is more likely to survive and pass on those genes to its offspring – that
is, increase the individual’s fitness. This combination of genetics and natural selection is termed
the Modern Synthesis, and is the cornerstone of modern evolutionary understanding.
“Instead of the varied theories of evolution which arose in different branches of biology, we are now
witnessing the emergence of a new science of life united by the great evolutionary idea”
Dobzhansky (1951)
Hardy-Weinberg Equilibrium
Evolution is simply a change in frequencies of alleles in the gene pool of a population. For instance,
let us assume that there is a trait that is determined by the inheritance of a gene with two alleles--
B and b. If the parent generation has 92% B and 8% b and the offspring generation have 90% B and
10% b, then you can safely say that evolution has occurred between the generations. The entire
population's gene pool has evolved towards a higher frequency of the b allele--it was not just those
individuals who inherited the b allele who evolved.
In 1908, Godfrey Hardy, an English mathematician, and Wilhelm Weinberg, a German physician
concluded that gene pool frequencies are inherently stable but that evolution should be expected in
all populations virtually all of the time.
Well, consider the opposite question: What has to be true in order for evolution to never occur?
Hardy and Weinberg showed that evolution will not occur in a population if seven conditions are
met:
If these seven conditions are met, the gene pool frequencies in a population will remain unchanged
– evolution would not occur. However, since it is highly unlikely that any even one of these seven
conditions, let alone all of them, will happen in the real world, evolution is the inevitable result.
Hardy and Wilhelm Weinberg went on to develop a simple equation that can be used to discover
the probable genotype frequencies in a population and to track their changes from one generation
to another. This has become known as the Hardy-Weinberg equilibrium equation. In this equation
(p² + 2pq + q² = 1), p is defined as the frequency of the dominant allele and q as the frequency of the
recessive allele for a trait controlled by a pair of alleles (A and a). In other words, p equals all of the
alleles in individuals who are homozygous dominant (AA) and half of the alleles in people who
are heterozygous (Aa) for this trait in a population. In mathematical terms, this is
p = AA + ½Aa
Likewise, q equals all of the alleles in individuals who are homozygous recessive (aa) and the other
half of the alleles in people who are heterozygous (Aa).
q = aa + ½Aa
Because there are only two alleles in this case, the frequency of one plus the frequency of the other
must equal 100%, which is to say
p+q=1
Since this is logically true, then the following must also be correct:
p=1-q
There were only a few short steps from this knowledge for Hardy and Weinberg to realize that the
chances of all possible combinations of alleles occurring randomly is
(p + q)² = 1
or more simply
p² + 2pq + q² = 1
To this day, Hardy and Weinberg’s equation is known as the null model for population genetics. This
is because, it will only hold true if evolution DOES NOT occur. That means, if we have a population
that is experiencing one or all of these situations:
then its gene frequencies will be shifted away from Hardy Weinberg expectation. In other words – it
will be evolving! We use shifts away from Hardy-Weinberg equilibrium to determine if a population
is in fact evolving. Note that this list is the exact opposite from the list of condition for no evolution
to occur.
Figure:
Under Hardy –
Weinberg
Equilibrium, the
frequency of
homozygous
and heterozygous
genotypes will vary in
a very predictable
way with allele
frequencies.
Say we have a population of 40 individuals. Four individuals have the homozygous recessive
genotype (aa), 16 have the homozygous dominant genotype and the rest are heterozygotes. Is the
population in Hardy-Weinberg equilibrium?
Then we use a Chi-square distribution to test the null hypothesis that the population IS in Hardy-
Weinberg equilibrium.
The Hardy-Weinberg Equilibrium provided both a "null hypothesis" for genetic evolution and a
mathematical basis for a more comprehensive theory of evolution in which natural selection,
Mendelian genetics, paleontology, and comparative anatomy were combined in what is now known
as the modern evolutionary synthesis. During the 1930s and 1940s, R. A. Fisher, J. B. S. Haldane,
Sewall Wright, and Theodosius Dobzhansky developed mathematical models for fitness, selection,
and other evolutionary processes. These models were then applied to demographic data derived
from artificial and natural populations of organisms in a rigorous (and ongoing) test of the validity of
the neo-darwinian model for genetic evolution. As a result of their work, Darwin's theories of natural
and sexual selection were combined with Mendelian genetics, biometry and statistics, demography,
paleontology, comparative anatomy, botany, and (more recently) molecular genetics and ethology
to produce a "grand unified theory" of the origin and evolution of life on Earth.
A solution to this problem was provided by Sewall Wright, who discovered a process that has
eventually become known as genetic drift. Wright proposed that in small populations of organisms,
random sampling errors could cause significant changes in allele frequencies in those populations.
He showed mathematically that the smaller a population was, the greater the effect of such
random events on its allele frequencies. In other words, the effect of genetic drift was inversely
proportional to population size.
Population geneticists also refer to this as gene flow. This simply means the movement of alleles
between populations. Migration in this evolutionary sense can occur by dispersal of adult animal
organisms, seeds and spores of plants, planktonic larvae of intertidal animals, gametes/zygotes of
algae, etc.
Effects of migration on allele frequencies: In the absence of selection (i.e. if alleles are selectively
neutral) migration homogenizes allele frequencies among populations. If selection and migration
tend to increase the frequencies of the same alleles, selection can amplify the effect of migration.
On the other hand, if selection is stronger than migration, then differences among populations will
be maintained, even in the face of migration. And finally, if migration is stronger than selection,
differences among populations will be reduced.
Directional selection can be balanced by influx of 'immigrant' alleles; thus, a stable 'equilibrium' can
be reached if migration rate is constant. However in some cases, migration can hinder optimal
adaptation of a population to local conditions. An example is the water snakes (Natrix sipedon) living
on islands in Lake Erie. Island Natrix mostly unbanded; on adjacent mainland, all banded. Banded
snakes are non-cryptic on limestone islands, eaten by gulls. Unbanded island snakes are not eaten.
But, because of recurrent migration from the mainland, the banded phenotype becomes more
frequent on island, interrupting the directional selection of unbanded snakes.
Wright linked migration and drift in his island model using the equation
Where FST is a measure of genetic differentiation between two populations (we will get to it in the
next section), Ne is the effective population size (the number of breeding individuals) and m is the
migration rate. This formula assumes that migration and drift are always in equilibrium. Thus, the
effect of drift and migration on any population will be the same, but opposite – drift will reduce the
number of alleles in each population while migration between the populations will increase it. There
relative effect of drift and migration will increase or decrease in proportion to the population size,
but they will always be in equilibrium.
However, in some cases, the equilibrium can be disrupted. For example, when one population has
been recently founded by colonists from another population and migration is only in one direction.
Theodosius Dobzhansky
If this isolation lasted long enough, Dobzhansky argued, the flies might lose the ability to interbreed
completely. They might simply become unable to mate with the other flies successfully, or their
hybrid offspring might become sterile. If the flies were now to come out of their isolation, they could
live alongside the other insects but still continue mating only among themselves. A new species
would be born.
Ernst Mayr
Mayr specialized in discovering new species of birds and mapping out their ranges. It was no easy
matter determining exactly which group of birds deserves the title of species. Biologists typically
tried to bring order to this confusion by recognizing subspecies — local populations of a species that
were distinct enough to warrant a special label of their own. But Mayr saw that the subspecies label
was far from a perfect solution. In some cases, subspecies weren't actually distinct from each other,
but graded into each other like colours in a rainbow. In other cases, what looked like a subspecies
might, on further inspection, turn out to be a separate species of its own.
Mayr realized that it was possible to explain the origin of species with genetics. Mayr also realized
that the puzzle of species and subspecies shouldn't be considered a headache: they were actually a
living testimony to the evolutionary process Dobzhansky wrote about. Variations emerge in different
parts of a species' range, creating differences between populations (see example of bird crests
below). In one part of a range the birds may possess long tails, in others, square tails. But because
the birds also mate with their neighbours, they do not become isolated into a species of their own.
A. Populations have genetic variation that continuously arises by undirected processes (mutation
and recombination)
C. Most adaptive variants have slight phenotypic effects, so that phenotypic changes are gradual
E. These processes, continued for a sufficiently long period of time, produce changes sufficient to
delineate higher taxonomic levels
Before the 1960s there was no data about protein or DNA variation. Remember that DNA and the
genetic code had only been discovered in the 1950s (see Section, The Central Dogma). During this
time, natural selection was believed to be the main driver of evolution, but there were two schools
of population geneticists: the Classical and Balance schools. The Classical school believed that
polymorphisms (the existence of more than one allele in a population of genes) were rare. They
argued that natural selection was a mainly
purifying force that removed any
deleterious alleles that may arise or would
drive any advantageous alleles to fixation.
Therefore, they believed that individuals
were homozygous for most loci.
However, in the mid-1960s the technique of protein electrophoresis was discovered allowing
investigation into the levels of isozyme polymorphism (see Lectures on Genetic Markers). The results
showed that large amounts of genetic variation was present in natural populations, much more
than was expected by the Classical school. This appeared to be a victory for the Balance school.
Segregational Load
However, it was soon discovered
that to maintain these high levels of
polymorphism at thousands of loci
by balancing selection would be
very costly. Summed over multiple
loci this high segregational load
would be large enough to drive
populations to extinction!
Motoo Kimura
However, the high levels of polymorphism can explained without
encountering excessive genetic
load simply by dropping the assumption that natural selection is the
driving force of molecular
evolution. Instead Kimura allowed the majority of mutations that were
fixed to be neutral and therefore have no effect on fitness. This was a
huge cha nge in perspective from both the Classical and Balance schools,
because the neutral theory stated for the first time that natural selection
If µ = mutation rate per gene per generation, and Ne = effective population size
Kimura suggested that genetic drift was a more important evolutionary force than selection. Under
the neutral model, the amount of genetic variation in a population is determined by a balance
between an increase due to mutation rate (= μ) and a decrease due to finite population size
(=genetic drift). All new mutations in a population have the same chance of drifting to fixation. A
mutation can either drift to fixation - when its frequency is 1 (or when all individuals in a population
have that mutation) or it can drift to extinction (when it is lost, frequency =0). Most of the time a
new neutral allele will be quickly lost from the population by genetic drift. But sometimes it will drift
into the population and get fixed, that is, it will replace (or substitute) the original allele in the
population.
The probability that the new allele will drift to fixation = 1/2N (this is equivalent to the
probability of reaching into a bag containing 2N black marbles and pulling out the only red
marble in the bag).
Basically meaning that the rate of neutral molecular evolution is independent of population size
and is simply equal to the neutral mutation rate.
Therefore, while the rate of origin and fixation of new mutations (µ) is independent of population
size, the rate of progress of the mutation through the population is proportional to the population
size. Therefore, under the neutral theory, polymorphisms in a large population are simply a result of
lots of neutral mutations arising and passing through the population at a slow rate such that at any
one time there are several different alleles at a particular locus drifting through the population.
According to the neutralists, most mutations are either deleterious and are selectively removed, or
are “effectively neutral”, in which case there is a small probability that they are fixed. Natural
selection is incorporated, but as a purifying force, removing deleterious mutations bur with only
a small role in fixing new mutations. As we have seen above, the probability of fixation of a
neutral allele by drift is 1/2N. If this probability is bigger than the selection pressure, the
influence of drift is greater than that of selection and the mutation is effectively neutral. So, the
neutral theory does not argue that most mutations are completely neutral, but that any selection
pressures are outweighed by the effects of drift. The neutral theory conferred a much greater role
for genetic drift in molecular evolution.
Kimura was also able to derive a new formula for Heterozygosity (H) levels under the neutral model.
Here, the most important thing to remember is the
“neutral parameter” 4Neμ
H = 4Neμ/(4Neμ + 1)
If the neutral theory is correct, we should see the following two predictions:
2. There is an inverse relationship between the rate of substitution (µ) and the degree of functional
constraint acting on a gene, such that functionally constrained genes or gene regions evolve at the
lower rate and vice versa.
Positive selection has been shown in mutation rates within the major histocompatibility complex
(MHC, a family of genes that determine resistance to pathogens) and HIV envelop proteins.
In the previous sections we have looked at the basic molecule of inheritance, how we have been
able to manipulate these molecules, how variation in the form of mutation is generated and the
forces that cause this variation to change with time, to evolve. Now we will look at how this variation
is structured in the living world. We will learn about the history of attempts at partitioning the
variation of life into categories and the details of modern-day methods of assessing genetic
structure.
This section closely follows Chapter 4 of your textbook Evolution 1st Edition by Bergstrom and
Dugatkin.
Chapter 15, Hartl and Jones, Genetics – Principles and Analysis (4th Edition).
Chapters 4, 14, Bergstrom and Dugatkin Evolution (1st Edition)
Section Outcomes.
Categorizing the continuous variation we see in nature is a difficult task. What is the best way
to do it?
Carolus Linnaeus
In the Northern Renaissance,
however, the scientific focus
shifted to biology. The taxonomic
system was developed by Carolus
Linnaeus (1707–1778), a Swedish
botanist, zoologist, and physician,
who wrote the Systema Naturae.
This taxonomy has proved so very
useful because of Linnaeus’ insight
that organisms can be arranged in
a hierarchical classification.
Linnaeus recognized that not only
can we assign species or
subspecies to groups of highly
similar organisms, we can also
array these groups of similar
species into larger groups of
moderately similar organisms, and
these larger groups can in turn be
categorized into yet larger groups of somewhat similar organisms, and so forth, until we have
accounted for all living things. It is important to note that Linnaeus came to this realization without
having a theoretical basis for why these hierarchical patterns of similarity should exist.
The German biologist Willi Hennig eventually revisited the problem of taxonomy using Darwin’s
ideas and, in doing so, established the modern approach to classification (Zuckerkandl and Pauling
1962; Hennig 1966). His classic 1966 book—Phylogenetic Systematics—is instructive, because it
emphasizes that, in addition to documenting evolutionary history, phylogenetic trees can help us
classify, or systematize, the world we see around us. We could classify organisms in many ways—for
example, by how large they are, by where they are located, or by their morphology. But in
phylogenetic systematics, we classify organisms according to their evolutionary histories—and
phylogenetic trees are our way of representing these evolutionary relationships. But how do we
build phylogenegtic trees?
Traits
The study of phylogeny rests on our observations of traits displayed by organisms. You are already
familiar with th concept of a trait from previous lectures, but formally, a trait can be any observable
characteristic of organisms; for example traits may be anatomical features, developmental or
embryological processes, behavioural patterns, or genetic sequences. Until the major advances in
molecular genetics that occurred in the 1970s, almost all trait measurements used in the study of
phylogeny were morphological or anatomical—bone length, tooth shape, and so on. With the
advent of molecular genetics, actual DNA sequences are now the most common traits used to
reconstruct phylogenies of extant organisms.
Reading Phylogenies
The top or root of a phylogeny is the first point in time captured by the phylogeny. All subsequent
branching patterns occurred more recently in time until the last branched (or leaves) of the tree
which usually signify the present day.
See also:
Rotating Phylogenies
Branch Lengths
When we use traits, whether its morphological features like bones or beaks or DNA sequences, to
infer common ancestry and thereby build phylogenies, we order them based on their similarity. You
will be forgiven for thinking that the more similar two traits are the more likely they are to have a
recent common ancestor, and likewise the more dissimilar two traits are the more likely they are to
not share a recent common ancestor –
not so!
Parsimony
Also known as cladistics. The fundamental idea behind parsimony is that the best phylogeny is the
one that both explains the observed character data and requires the fewest evolutionary changes –
exactly the same as the Occams Razor principle – the theory that has the fewest assumptions is the
correct one. In parsimony, traits are referred to as characters, and they are able to change their
state eg. if the homologous trait “fur” is a character, then “white” and “dark” fur colour are
character states. The best or most parsimonious tree is the one with the fewest character state
changes.
Advantages
Simplicity
Disdvantages
Not statistically consistent. The tree produced does not always have the highest likelihood.
Slow
A most parsimonious tree might never be found, depends on the quality of the data.
Bayesian inference
Similar to maximum likelihood in that it uses the same evolutionary models, but otherwise very
different. Bayesian phylogenetic analysis uses Bayes' theorem, which relates the posterior
probability of a tree to the likelihood of data, and the prior probability of the tree and model of
evolution. Unlike parsimony and likelihood methods, Bayesian analysis does not produce a single
tree or set of equally optimal trees. Bayesian analysis uses the likelihood of trees in a Markov chain
Monte Carlo (MCMC) simulation to sample trees in proportion to their likelihood, thereby producing
a credible sample of trees.
Advantages
Faster than maximum likelihood
Can combine different kinds of data (eg. molecular and morphological).
Offers the possibility of setting priors based on previously obtained data
Disadvantages
Priors if not chosen appropriately could ntroduce bias
Convergence of the Markov Chain can be difficult to assess
MCMC methods can be described in three steps: first using a stochastic mechanism a new state for
the Markov chain is proposed. Secondly, the probability that this new state is correct is calculated.
Thirdly, a new random variable (0,1) is proposed. If this new values is less than the acceptance
probability the new state is accepted and the state of the chain is updated. In this way the MCMS
avoids looking for the solution to the problem is all the possible probability space, and concentrates
instead on this areas where the probability is higher than where it is at the moment. This can be
illustrated by the robot walking up the hill below. He is prevented from walking down the hill and
immediately goes back to the top and tries to get to a higher point.
This process is run for either thousands or millions of times until the simulation has reached a
stationary distribution, when it has converged at the maximum value. The amount of time a single
tree is visited during the course of the chain is just a valid approximation of its posterior probability.
One approach to assessing how well a tree represents all of the data is to resample the data
repeatedly and reperform the phylogenetic analysis to see how often the same result is obtained
from these resampled (and nonidentical) datasets. Resampling can be done by bootstrapping in
which the characters (e.g., alignment columns) are resampled with replacement, or by jackknifing, in
which the characters are resampled without replacement. Frequently, 100 or 1000 of these new
resampled datasets are generated and a phylogenetic tree is built from each of them. The new trees
are then compared to determine in what fraction of the trees particular evolutionary groupings are
found. It is very important to realize that these tests do not determine how accurate a tree is, just
how well it reflects the underlying data.
Phylogenetic networks
Cluster algorithms
These are commonly used in genetic studies. They typically use multilocus data from genetic
markers including SNPS, microsatellites, RFLPs and AFLPs. Their uses include inferring the presence
of distinct populations, assigning individuals to populations, studying hybrid zones, identifying
migrants and admixed individuals, and estimating population allele frequencies in situations where
many individuals are migrants or admixed.
Wright’s FST
The fixation index (FST) is probably the most reported measure of population genetic
differentiation due to genetic structure. It is frequently estimated from genetic polymorphism data,
such as single-nucleotide polymorphisms (SNP) or microsatellites. It was developed by Sewall
Wright. The most commonly used definition for FST at a given locus is based on the variance of allele
frequencies between populations
If is the average frequency of an allele in the total population, is the variance in the frequency
of the allele between different subpopulations, weighted by the sizes of the subpopulations,
and is the variance of the allelic state in the total population, the FST is defined as
This definition illustrates that FST measures the amount of genetic variance that can be explained by
population structure. This can also be thought of as the fraction of total diversity that is not a
consequence of the average diversity within subpopulations, where diversity is measured by the
probability that two randomly selected alleles are different, namely .
Analysis of molecular variance, is a statistical model for the molecular variation in a single species.
The method was developed by Laurent Excoffier, Peter Smouse and Joseph Quattro at Rutgers
University in 1992. One can break populations down into three hierarchical levels: within demes (S),
within groups of demes (G) and the total population (T). Given this structure, one can break Wright’s
F statistics, which are comparisons of allele frequencies between two entities, into three
components: among demes within group (FSG), among groups within the total population (FGT) and
among demes within the total population (FST).
However, molecular data reveals not only the frequency of molecular markers, but can also tell us
something about the amount of mutational differences between different genes. A technique that
could be used to estimate population differentiation by analyzing differences between molecular
sequences rather than assumed Mendelian gene frequencies would therefore be very useful. Thus
AMOVA breaks down the variance components of these three hierarchical levels into фSG
What is a species?
Before we can examine speciation, we first have to ask ourselves: what is a species?
Many biologists have tried to answer that question previously. As a result we have around 26
different ideas of what is a species out to be - from the Morphological species concept of Linnaeus,
to the Biological species concept of Dobzhansky and the Evolutionary Species concept of Simpson.
Many have come close, but all have come short.
Why?
Because the continuity of evolution makes a species impossible to define. For every attempt at a
definition, there are some annoying exceptions to the rule. But yet, we know that they exist – we see
them everywhere we look. We ourselves are a different species to chimpanzees and gorillas, even
though we are closely related.
Modes of Speciation
How do new species originate? In other words, how can we understand the origin of species—the
question that occupied, indeed tormented, Darwin for so many years? All around us we see an
astonishing array of different life-forms. How could such a diversity of different species come to be?
Allopatric speciation
is speciation that occurs when
biological populations of the same
species are separated by geography
from each other to an extent that
prevents or interferes with genetic
interchange. This can be the result
of population dispersal leading
to emigration, or by vicariant changes
such as mountain formation, island
formation. The isolated populations
then undergo genotypic and/or
phenotypic divergence as: (a) they
become subjected to
dissimilar selective pressures; (b) they
independently undergo genetic drift;
(c) different mutations arise in the
two populations. When the populations come back into contact, they have evolved such that they
are reproductively isolated and are no longer capable of exchanging genes. Island genetics is the
term associated with the tendency of small, isolated genetic pools to produce unusual traits.
Examples include insular dwarfism and the radical changes among certain famous island chains, for
example on Komodo and the Galápagos islands.
Parapatric Speciation
In the parapatric speciation model, the
population of a species constitutes one
or more biogeographically distinct
subpopulations with a small,
continuous overlap or minimal contact
zone between populations. This
minimal contact zone may be the result
of unequal dispersal or distribution,
incomplete geographical barriers, or
divergent expressions of animal
behaviour, among other things. A
parapatric population distribution may
result in nonrandom mating and
unequal gene flow, which can then
produce an increase in
the dimorphism between populations.
In parapatric speciation, there is an
intrinsic barrier of nonrandom mating and distinct selection pressures that create unequal gene
flow. Parapatric speciation is the culmination of this unequal gene flow effect, in
which genotypic dimorphism between populations results in speciation of the population and
redefines the population as two or more sister species.
Sympatric speciation
This is the process through which new species evolve from a single ancestral species while inhabiting
the same geographic region. The mechanisms for sympatric speciation are unclear. Disruptive
selection has often been invoked, but rarely demonstrated. Horsehoe bats are thought to speciate
sympatrically by changing their calls. Cichlid fish in central Africa may have speciated sympatrically
via sexual selection..
Behavioral Isolation: patterns of courtship may be altered to the extent that sexual
union is not achieved (for example: albatross courtship rituals).
Post-zygotic isolating mechanisms may appear first since the two populationas are recently diverged
and my still be similar. However, these are costly in terms of time, energy, lost reproductive
opportuniuties, and fitness. Therefore, selection should ultimately operate for pre-zygotic
mechanisms.
Most species are separated from one-another by more than one pre-zygotic mechanism.
Hybrid speciation
This is a form of speciation wherein hybridization between two different species leads to a new
species, reproductively isolated from the parent species. This is not new in plants, we know that
many plants are polyploidy, and this must have arisen through hybridization of two parental species.
Because plants are able to double or even quadruple their chromosome number, they are able to
take advantage of adaptive benefits that polyploidy can confer. On the other hand, examples of
hybrid speciation in animals, homoploid hybridisation as it does not result in a change in
chromosome number, were rare until recently. Ernst Mayr believed that hybridisation had little to
no role to play in animal speciation because the fitness of hybrid phenotypes would almost certainly
be less than that of either of the two parental phenotype. Among the few known examples are
insects and fish. However, the great skua has a surprising genetic similarity to the physically
dissimilar pomarine skua, and most ornithologists now assume the great skua is a hybrid species
between the pomarine skua and one of the northern skua species. Mayr’s view on hybrid speciation
looks set to change in the new era of genomics, where we are able to look into all the genetic
material and unlock all its secrets.
Adaptive introgression
A classic example of adaptive introgression are a genus of South American Heliconius butterflies. In
this case, one species has gained an adaptive advantage in mimicking the colour pattern of an
inedible non-sister taxon. This is conclusive evidence that different animal species can hybridise and
that hybrid fitness can be greater than that of the parents.
Part of the answer may lie in reproductive isolation, which must by definition be incomplete if
introgression can occur. Perhaps the rapid evolution that occurs during an adaptive radiation is
enough to evolve distinct phenotypes (species) but not for the evolution of complete reproductive
isolation. It may also be that rapidly evolving species suffer from losses in genetic variation, and
could thus benefit from an influx of hybrid genetic diversity. Whatever the case, we will discover it in
the next few years...
The coalescent process is a powerful modeling tool for population genetics. The allelic states of all
homologous gene copies in a population are determined by their own genealogical and mutational
history. Its history can be traced back to the Wright-Fisher model - a very simple mathematical
model of the evolution of a population. It says that we have a set of discrete non-overlapping
generations, where each new generation is sampled from the previous by sampling at random with
replacement.
So you start out with a set of of N individuals in the first generation and then you create the next
generation by Ntimes selecting a parent
from the first population at random, and
copy him/her to the next generation.
How?
If you consider the figure below, which shows the coalescent process for a single gene going
backwards in time for three phenotypically different species A, B and C. Alleles are coloured
differently to show when they mutated from one form to the other. Now if you sampled members of
each species today, and sequenced that gene, you would find that your gene genealogy placed them
in reciprocally monophyletic clades, corresponding to three distinct species. You would be very
happy and you would assume that the gene genealogy you produced was a true reflection of the
true species tree – which is the big fat tree that enclosed the gene genealogy. However, if you had
sampled them 5 generations into the past, you would find that your gene genealogy was
paraphyletic – although these look phenotypically like different species, your tree would not
separate them into monophyletic species clades. This is because of a phenomenon called incomplete
lineage sorting (or trans species polymprphism).
The problem is that in most cases, you will not arrive to sample your three species after the lineages
have sorted. This is because all species are in a different state of their evolution and it is just a
matter of chance what you discover by sequencing just one gene. Many studies that use only
mitochondrial DNA to infer evolutionary history have this problem. This is why, to obtain an estimate
of the true species tree, you have to sequence several different genes, and then perform coalescent
modelling to obtain the most likely species tree.
Molecular ecology is the field of biology that merges genetics with ecology. It brings together the
theoretical approaches of population genetics and phylogenetics, which we have learned about in
previous lectures, with the more applied fields of phylogeography and conservation genetics – which
we will examine in this section.
Molecular Ecology has now come of age and is primed to become a science in its own right. The
technological advances in the last two decades have seen an unprecedented rise in molecular
ecological studies. Importantly, recent advances such as next generation sequencing, multiprocessor
computing, sophisticated analytical software, bioinformatics and GPS tracking now make it possible
to study wild populations in unprecedented detail.
Section Outcomes.
1. Explain how a fluctuating palaeoclimate can structure genetic variation with examples
2. Evaluate the out-of-Africa model for human migration in light of recent evidence from next
generation sequencing
3. Identify the reasons for conserving biodiversity
4. Demonstrate the effect of a bottleneck on the genetic diversity of a population
5. Describe the consequences of losing genetic diversity
6. Evaluate the concept of the evolutionary significant unit, why we need them and how best
to define them.
7. Explain with examples why taxonomic issues are important in conservation
8. Conservation genetics with conservation genomics
Phylogeography is the study of the historical processes that may be responsible for the
contemporary geographic distributions of individuals. This is accomplished by considering the
geographic distribution of individuals in light of the patterns associated with a gene genealogy.
This term was introduced to describe geographically structured genetic signals within and among
species. An explicit focus on a species' biogeography/biogeographical past sets phylogeography
apart from classical population genetics and phylogenetics.
Past events that can be inferred include population expansion, population bottlenecks, vicariance
and migration. As you know, these events all have associated consequences because of genetic drift,
migration and selection. Recently developed approaches integrating coalescent theory or the
genealogical history of alleles and distributional information can more accurately address the
relative roles of these different historical forces in shaping current patterns.
The term phylogeography was first used by John Avise in his 1987. Early phylogeographic work has
recently been criticized for its narrative nature and lack of statistical rigor (i.e. it did not statistically
test alternative hypotheses). The only real method was Alan Templeton's Nested Clade Analysis,
which made use of an inference key to determine the validity of a given process in explaining the
concordance between geographic distance and genetic relatedness. Recent approaches have taken a
stronger statistical approach to phylogeography than was done initially.
The field of comparative phylogeography seeks to explain the mechanisms responsible for the
phylogenetic relationships and distribution of different species. For example, comparisons across
multiple taxa can clarify the histories of biogeographical regions. For example, phylogeographic
analyses of terrestrial vertebrates on the Baja California peninsula and marine fish on both the
Pacific and gulf sides of the peninsula display genetic signatures that suggest a vicariance event
affected multiple taxa during the Pleistocene or Pliocene.
The figures below map out the phylogeographic history of poison frogs in South America.
Phylogeography integrates biogeography and genetics to study in greater detail the lineal history of
a species in context of the geoclimatic history of the planet. An example study of poison frogs living
in the South American neotropics (illustrated to the left) is used to demonstrate how
phylogeographers combine genetics and paleogeography to piece together the ecological history of
organisms in their environments. Several major geoclimatic events have greatly influenced the
biogeographic distribution of organisms in this area, including the isolation and reconnection of
South America, the uplift of the Andes, an extensive Amazonian floodbasin system during the
Miocene, the formation of Orinoco and Amazon drainages, and dry−wet climate cycles throughout
the Pliocene to Pleistocene epochs. Using this contextual paleogeographic information
(paleogeographic time series is shown in panels A-D) the authors of this study proposed a null-
hypothesis that assumes no spatial structure and two alternative hypothesis involving dispersal and
other biogeographic constraints (hypothesis are shown in panels E-G, listed as SMO, SM1, and SM2).
The phylogeographers visited the ranges of each frog species to obtain tissue samples for genetic
analysis; researchers can also obtain tissue samples from museum collections.
The evolutionary history and relations among different poison frog species is reconstructed using
phylogenetic trees derived from molecular data. The molecular trees are mapped in relation to
paleogeographic history of the region for a complete phylogeographic study. The tree shown in the
Human phylogeography
Phylogeography has also proven to be useful in understanding the origin and dispersal patterns of
our own species, Homo sapiens. Based primarily on observations of skeletal remains of ancient
human remains and estimations of their age, anthropologists proposed two competing hypotheses
about human origins.
In his famous book, Frankham defined conservation genetics as “the application of genetics to
preserve species as dynamic entities capable of coping with environmental change.”
Nevertheless, Conservation genetics is a large and rapidly growing field of biology. It covers a
wide range of topics: inbreeding depression, loss of genetic diversity, reduced gene flow, genetic
drift, accumulation of deleterious alleles, genetic adaptation to captivity, resolution of taxonomic
uncertainties, definition of management units, forensic application, understanding species
biology and outbreeding depression.
Genes: Wild animals and plants are sources of genes for hybridization and genetic engineering.
Food sources: Animals, plants, mushrooms, etc.
Natural products: medicines, fertilizers, and pesticides we use are derived from plants and animals.
We also get products such as oils, adhesives, and silk from natural sources.
Environmental services: We rely on plants and animals for important processes such as soil aeration,
fertilization, and pollination.
Scientific interest: The diversity of plants and animals inspires scientific inquiry in many different
realms. Evolutionary science, anatomy, physiology, behaviour, and ecology are only a few examples.
Self-perpetuation: Biologically diverse ecosystems help to preserve their component species,
reducing the need for future conservation efforts targeting endangered species.
It includes understanding the relationships and diversity which represent biodiversity need not
(directly) be an applied science, but can address issues relating to understanding diversity may assist
planning viable conservation strategies more than conserving directly.
What does this tell us about the important processes for creating and maintaining diversity?
Frankham’s view appears excessively optimistic given the challenges we face in the modern world.
Habitat loss is a massive problem. In many cases 95% or more of habitats have already been
destroyed by man, and yet the human population continues to grow, requiring ever more resources.
As a result of this, wild animal population numbers have declined by more than 50% since 1970. In
many cases the decline is over 90%.
I either case, genetic diversity will generally remain lower, only slowly increasing with time as
random mutations or gene flow from other populations occurs. In consequence of such population
size reductions and the loss of genetic variation, the robustness of the population is reduced and its
ability to survive selecting environmental changes, like climate change or a shift in available
resources, is reduced.
Inbreeding
Inbreeding is the mating of individuals that are related by descent. Offspring resulting from such
The population inbreeding co-efficient (FIS) can be calculated from co-dominant genetic data such as
microsatellites. It gives you an idea of the proportion of homozygous individuals in a population.
FIS = (1 - Freq of heterozyzotes)/the frequency of heterozygotes
An FIS > 0 denotes heterozygote deficiency, whereas FIS < 0 denotes heterozygote excess. One can
also use FIS to determine if population allele frequencies meet Hardy-Weinberg expectation.
Since conservation depends on politicians and money, we need to define units of biodiversity that
need to be conserved, because then, we can try to lobby politicians spending money on a defined,
discrete entity. A major question is: should we be conserving species? As you have learned already,
“species” only is a very simplistic approach in the complexity of the real world. Species are not really
discrete entities. And if we decide to conserve species, we may ask: Is the biodiversity only the
number of species present? And: are all species equally relevant for biodiversity? And: Must
biodiversity be determined entirely by our species definition?
These are genetically differentiated populations that have a high priority for separate management
and conservation
closely related (sometime synonymous) to
subspecies
distinct population segments (DPS) - Endangered Species Act (USA)
Many authors suggest that ESUs, subspecies and DPS all merit separate management and have a
high priority for conservation. The fundamental idea is that conservation should aim to preserve
evolutionary processes and adaptive potential, not just current species without regard to losing
significant variation within species. Because if variation within a species is lost, the species loses its
adaptive potential.
But it is difficult to define an ESU. What is genetically differentiated mean? Originally, it was defined
by reproductively isolated and then by ecological distinctness. In 1994 Craig Moritz proposed a
totally genetic definition:
The Moritz criteria have been used in dozens of cases to define genetic conservation units. A recent
study on imperiled cave crayfish in the Appalachian Mountains of eastern North America
demonstrates how phylogenetic analyses along with geographic distribution can aid in recognizing
conservation priorities. Using mtDNA, the authors found that hidden within what was thought to be
a single, widely distributed species, an ancient and previously undetected species was also present.
Conservation decisions can now be made to ensure that both lineages received protection. Results
like this are not an uncommon outcome from phylogeographic studies.
An analysis of salamanders of the genus Eurycea, also in the Appalachians, found that the current
taxonomy of the group greatly underestimated species level diversity. The authors of this study also
found that patterns of phylogeographic diversity were more associated with historical (rather than
modern) drainage connections, indicating that major shifts in the drainage patterns of the region
played an important role in the generation of diversity of these salamanders. A thorough
understanding of genetic structure will thus allow informed choices in prioritizing areas for
conservation.
Unfortunately, although potentially very useful for conservation management, the ECU scheme has
as yet never been implemented. This is because it requires a lot of non-genetic data about the
populations involved, which is not available for many species, including high profile endangered
species.
Taxonomy is very important for determining what is out there to be conserved. While some groups
of organisms, such as birds and mammals, have received much attention in this respect, the
relationships between taxa in other groups, especially invertebrates and bacteria are less well
known.
As conservation decisions are very often based around conservation units such as species,
subspecies, ESUs, DPSs, etc, it is very important that the taxonomic status of a population is correctly
assigned. However, these units on which conservation decisions are based are difficult to define, as
shown in the last lecture. In addition, subspecies are often accorded legal protection in many
countries including South Africa. The subspecies concept is even more subjective and controversial
than the species concept. The difficulty lying in the fact that there are no sharp boundaries between
what represents a species, a subspecies and mere subpopulations.
Two subspecies can possibly be viewed as being two populations part of the way towards full
speciation. Sequences of conserved genes, such as sections of the mitochondrial genome, are widely
used to determine species and subspecies status. The sequences are used to build phylogenetic
trees as previously described. When discrimination between more closely related groups (between
populations) is required, faster evolving bits of the genome, such as microsatellites, are often used.
Incorrect taxonomy
But how large does genetic divergence have to be before we are dealing with “good species”?
Although genetic data exist for many species, the vast majority of species (and subspecies) have
been defined only by taxonomists, often dealing
with small sample sizes (in museums) with limited or
patchy geographic coverage of the species range.
For most species, their legal status is still based on
taxonomic classifications. Serious problems can
arise when management decisions are taken based
on incorrect taxonomic classification.
Wildlife forensics
The utility of genetic markers for taxonomic purposes has also been exploited for a number of
forensic applications such as the tracking of rare or elusive animals, or the identification of species
from bits of tissue. The RhoDIS database of African rhinoceros, held at the University of Pretoria is a
good example. Here, they are building a genetic data base of all living rhino in Africa. With these
microsatellite genetic profiles, they are able to identify rhino products such as confiscated horns,
down to the individual level. This evidence has often been used to convict poachers in courts of law.
Forensic applications are primarily a result of our ability to amplify tiny amounts of DNA using the
polymerase chain reaction. The small amount of DNA contained in hair shed by animals or even
faeces (though this is a smelly and messy business) is often sufficient. The tissue samples used in
forensic applications are often highly degraded and contain miniscule amounts of DNA. For this
reason, mitochondrial genes are often amplified from such tissue samples as more copies of mtDNA
are present per cell compared to nuclear DNA. The Pyrennean brown bear and the hairy-nosed
wombat are cases where hair and/or faecal samples have been used for tracking purposes. The
identification of protected whale species in commercially sold food items is another example where
genetic tools have been used successfully for conservation purposes.
It is becoming increasingly obvious that the identification of species or populations for conservation
prioritisation depends on the balance between neutral and adaptive divergence. Populations that
are more highly distinct, both in terms of adaptive and neutral variation, are more highly prioritised
for conservation effort.
Conservation genomics incorporates the latest technological advances of the genomics revolution.
Genomics approaches, which have been revolutionising all fields of biology recently, can offer
important insights into a number of challenges faced by conservation biology such as identification
of functionally important genomic variation and an improved understanding of the mechanisms
behind important conservation genetic processes such as inbreeding depression.
Recently, new technical developments have opened the way to ask and answer new questions. The
invention of next generation sequencing (NGS) techniques enables the collection of genome-wide
information on genetic variation. NGS also facilitates genomic studies of non-model species that
lack data on the genome and transcriptome. This revolutionises the field of conservation genetics in
the following ways:
1. Applying NGS techniques will give estimates of genetic variation across the entire genome, instead
of estimates of variation based on a limited set of markers.
2. Information on variation in thousands of single nucleotide polymorphism (SNP) markers allows a
population genomic approach, which enables signals of selection and adaptation to be identified.
SNP markers associated with selection can be investigated in small populations, which may lead to
evaluations of the balance between genetic drift and selection.
3. NGS allows the study of gene expression rather than the study of sequence variation.
Transcriptomic studies will aid in identifying genes of adaptive importance, and will help
considerably in investigating the mechanisms of processes that are important in a conservation
genetic context (such as inbreeding depression and local adaptation).
Genomics knowledge and facilities are very unevenly distributed across countries