Gene Hunting

Gene Hunting
Lecture 1
Strategies for Identifying Disease Genes
Candidate Gene Approach
If you have some idea of another pathological basis of the disease, or if there is a similar
animal or human disease for whose basis is known, it can be possible to guess what the
gene might be and sequence that gene directly in patients.
Positional Approach
Determine the location of disease gene, by linkage analysis or genome-wide studies.
Positional Cloning
Identification of a disease gene through its location in the human genome, without prior
knowledge of its function.
Positional closing can be achieved by linkage analysis (genetic mapping) or association

studies, or by the identification of chromosomal abnormalities, or by its associate with
disease.
This would be followed by the analysis of candidate genes within the disease-associated genomic
region.
Linkage and Recombination

Mapping disease genes in humans is done by using ‘Polymorphic DNA markers’
(Polymorphisms),
These are the DNA variation between individuals, most of them are not themselves disease
causing.
Known, characterised human genetic polymorphisms (markers) may lie close to unknown,
uncharacterised polymorphisms that cause disease (mutations).
Gene Hunting 1
Genetic Markers: Useful DNA variants at known chromosomal locations that can be genotyped
easily, using simple lab techniques (e.g. PCR).
Meiotic Recombination
Closely linked genes are unlikely to experience a crossover between them during meiosis,
A genetic distance of 1 centimorgan (1cM) is the distance between 2 genes that show a 1%
rate of recombination.
In 99% of meioses, the alleles of the two genes are inherited together. 1-2Mb in
Mammals.
Genome-Wide Scans for Disease Genes
Uses easily assayed genetic markers scattered throughout the genome and looks at the
pattern of inheritance of alleles of these markers.
The markers whose alleles are most commonly co-inherited with disease are those that
are most closely linked to a disease gene.
The markers are typically ‘Microsatellites’ or ‘Single Nucleotide Polymorphisms’

(SNPs).
In a specific genetic disease, polymorphisms are studies in family members to find genetic
linkage.
If the polymorphism is close to (or in) the disease gene on the chromosome, there is a low
chance of recombination between the marker and the disease causing mutation at meiosis,
and linkage is observed,
If the polymorphism and disease gene are far apart or on different chromosomes, linkage is
not observed.
Recombination Fractions
Recombination fraction ‘θ’, is calculated by dividing the number of offspring with the
recombinant allele combination (R) by the total number of offspring (R + NR): θ = R/(R+NR)
θ is between 0 and 0.5 (0 and 50%)
The closer together the genes are, the smaller the Recombination Fraction (RF),
θ = 0.5, for unlinked genes (very far apart, or on different chromosomes),
Gene Hunting 2
θ of 0.01 = 1 cM = 1 Mapping Unit (m.u.)
Polymorphism in Human DNA

Millions of sites in Human DNA are different between individuals,
Most polymorphisms are in non-coding DNA, as there is more of it, and mutations are not
selected against it.
Genetic markers is the name given to polymorphic sites that can be easily genotyped.
Commonly used markers include Single Nucleotide Polymorphisms (SNPs) and

Tandem Repeat Sequences (Microsatellites).
Satellite DNA
Satellite DNA was identified in the early days when genomic DNA fragments were
centrifuged in sucrose or CsCl Gradients,
We now know it consists of DNA with long stretches of Tandem Repeat Sequence
A:G:C:T is not 1:1:1:1, so different density from non-repeat DNA.
Minisatellites
Tandem repeats of short 10-15bp sequences, with a ‘core’ sequence of ‘GGGCAGGANG’,

scattered through the genome.
Hypervariable Minisatellite DNA’: The number of repeat motifs at any one locus is very
variable and there are many alleles in the population.
Southern blotting using probes against minisatellite core sequence was the original basis
of DNA fingerprinting.
Shared minisatellite bands between parents and children can prove or disprove paternity
cases.
Microsatellites
Microsatellites are currently the most significant and widely used genetic markers for most
species, it consists of stretches of (usually) di- or trinucleotide repeats
‘CTCTCTCTCTCTCTCTCT’.
The length of each microsatellite is very variable, they appear to be very prone to ‘slippage’
during DNA replication, maybe also unequal crossing over of longer stretches.
Gene Hunting 3
Well over 100000 microsatellites have been identified and mapped through the human genome,
and their length can be assayed by PCR using unique primers that flank each microsatellite
sequence. Different alleles produce different sized bands after PCR.
Whatever microsatellite you want to use as a marker, you can go online and by the primer you
need.
Primers are specific so they only amplify the one microsatellite locus.
Any particular microsatellite is likely to have many different alleles (i.e. length polymorphisms)
within the wider population.
After PCR, these alleles can be assayed as different sized bands on a gel, or by running through
capillary tubes.
Similar to running sequence reactions: get the exact length as a peak coming through the
machine.
Most microsatellites lie outside genes or in introns and are purely used as markers. Some lie
within genes, e.g. the trinucleotide repeats underlying Huntingtons disease or Duchenne
Muscular Dystrophy are microsatellites.
Example
Autosomal Dominant Disease,
A Microsatellite marker is closely linked ti the gene and can be assayed by PCR,
Affected dad is Heterozygous for the microsatellite (184bp and 200bp) and the mutant
disease gene allele segregates with the closely linked 200bp microsatellite allele.
The disease allele of the gene is linked to the 200bp allele of the microsatellite (this can then
be used for prenatal screening of the affected daughters children in this family).
Microsatellites can be used on a much larger scale as linkage markers to fine disease genes.
Gene Hunting 4
Autozygous Mapping:
Autosomal Recessive
Disease,
In Inbred families, there will

be a conserved haplotype of
linked markers that co-
segregate with the disease,
Can trace this haplotype

and see which genes lie
within it.
Single Nucleotide Polymorphisms (SNPs)
As humans, the vast majority (99.9%) of our DNA sequence is identical, even between different
ethnic populations.
There are many loci where differences exist: bits of sequence where different people have a
different base, for example, 85% of alleles have an A and 15% of alleles have a C at a
particular base pair.
This would be a single nucleotide polymorphism. Over 5000000 have been identified
and characterised in the human genome.
These can be in our out with our genes
A Single SNP may have different allele frequencies in different human populations: 85% of
people may have A/A in western Europe, but 90% in Africa, and 65% in Asia.
SNPs can also be used as genetic markers for gene mapping and disease association.
We have known and used SNPs since before they were names. The basis for ‘Restriction
Fragment Length Polymorphisms’ (RFLPs).
Some SNPs by chance create or mutate restriction sites, so that when DNA or interest is
digested, you get different sized bands depending on which SNP allele is present.
Gene Hunting 5
Microsatellites vs SNPs
There are many more SNPs identified than microsatellites, allows for much finer mapping.
Majority of SNPs only have 2 alleles, where some diseases have multiple alleles within
microsatellites (C.F).
Microsatellites are easy to assay by PCR.
SNPs have to either be directly sequences or analysed by techniques such as Hybridisation

to SNP chips, which are more complex than PCR
Genome Scans
The problem with Human genetics is you cant set up the crosses and control who mates with
whom.
All you have is pedigrees, which may be incomplete, and genetic date may not be
forthcoming from all individuals in the pedigree.
Genome Scan is genotyping a collection of families with the genetic disease using hundreds of
genetic markers from all over the genome.
Using hundreds of markers ensures the unknown gene will be close enough to one or two of
them to show genetic linkage.
The aim may be to find linkage with several markers. Then you would know that the disease
gene must be in the candidate region of the genome around those markers.
LOD Scores (Log of the Odds)
Experimental animals such as mice or fruit flies produce very large numbers of offspring, so can
estimate recombination frequency ‘θ’ very accurately.
Human families only produce small numbers of children, to get statistically significant evidence
for linkage: combine evidence from different families.
A mathematical procedure, implemented by computer software, is used to generate these

LOD scores.
The LOD score is a statistic that describes the strength of evidence for linkage, at any
chosen value of θ, given the family data available.
Gene Hunting 6
The recombination fraction ‘θ’ can be calculated for any two loci by linkage analysis. θ varies
between 0 (100% linkage) to 0.5 (no linkage).
Likelihood ratio at given value of ‘θ’ equals the likelihood of the observed data, if the loci are
linked at recombination factor θ (Lθ), divided by the likelihood of the observed data if the loci is
not linked (θ = 0.5)(L0.5).
Logarithm (base 10) of this ratio if the LOD Score (Z).
LOD(θ) = Log10 [L θ/L(0.5)]
If a particular disease gene is linked to a DNA marker with LOD score (Z) of 4, at
recombination fraction θ=0.05, it means that in the families studies it is 10000 times more
likely that the disease and marker are linked roughly 5cM apart than that they are not linked.
Produce a plot of LOD scores for different values pf θ, based on observed data.
A LOD score of 3 or more is considered good evidence for linkage.
A LOD score of -2 or less is evidence against linkage.
Values between -2 and 3 are inconclusive and indicate that more data must be obtained.
LOD Graph
Gene Hunting 7
Gene Hunting 8
A LOD score of 4 implies a 1000x likelihood of linkage over chance, however we have 46
chromosome arms (23 chromosomes with p and q arms).
Take any two random loci in the human genome, the prior probability of them being on
different chromosome arms (unlinked) is around 50x more than the probability that they are
linked.
Non-linkage is 50x more likely by chance than linkage, so LOD score of 3 (1000:1) will be
spurious 1 in 20 times (1000/50).
So LOD score 3 gives overall probability that loci are unlinked of 20 to 1,
P = 0.5, 95% chance is genuine linkage.
LOD Calculation Example
Autosomal Dominant Pedigree,
Gene Hunting 9
A and B are alleles of a locus that is suspected of being linked to disease,
Find the value of Z for proposed θ = 0.5,
Looks like disease carried with B allele, So In generation 4 of progeny in III from the
pairing in II, the probability of the observed [B and Disease] will occur in 2 individuals
and [A and Healthy] in the other 2 is (1 − 0.05)4 .
If the loci are not linked (θ = 0.5), the probability of the observed [B and Disease] will
occur in 2 individuals, and [A and Healthy] in the other 2 is (0.5)4 .
LOD score for the family at θ = 0.5 therefore is log10[(0.95)4 /(0.5)4 )]i.e. = 1.12,
LOD score for the family at θ = 0 is log10(1)4 /(0.5)4 i.e. = 12
LOD score for the family at θ = 0 is log1 0(1)4 /(0.5)4 i.e. = 1.2
Multipoint Genetic Markers

Using 1000s of markers, genetic maps have been constructed across the whole genome.
Multipoint mapping uses several markers at once to localise a disease gene, relative to the other
markers in the map.
More efficient process than using one marker at a time.
Multipoint linkage analysis uses computing power to perform all permutations of LOD scores for
several markers surrounding disease locus and produces a location score map,
The highest peak is the most likely location of the gene.
Lecture 2
Genome Wide Association Studies
Powerful use of SNPs to identify the location of disease susceptibility genes,
Population level study - get 100s or 1000s of affected and unaffected individuals and screen
alleles of 500000 - 1000000 SNPs in all,
Statistical analysis may show that a particular allele of a certain SNP is over-represented in a
patient suffering from a disease,
E.g. a SNP somewhere in the genome which in the general population shows 90% alleles as
‘T’, and 10% are ‘C’, but in patients suffering with disease, 80% are ‘C’,
Gene Hunting 10
Suggests that the disease gene is somewhere around that SNP
Why does GWAS work?
Possible, though less likely, that as SNP identified with a disease is the disease mutation: e.g.
inactivating a gene,
More likely that the actual disease mutation is somewhere near the SNP,
Works because SNPs are inherited as part of a haplotype:
The human genome is young and within populations closely linked SNPs are in
disequilibrium.
I.e. particular alleles of closely linked SNPs tend to be inherited together,
When new mutation arises, it will remain associated with the surrounding SNP alleles of the
person who developed a mutation,
Association (a function of alleles) is not the same as linkage (function of a loci),
Proving its the Right Gene
To prove you have the right gene, you need:
1. Detection: Find potentially pathogenic mutations in the gene that segregate with the disease,
2. Validation: Confirm that the mutation(s) you found are pathogenic, i.e. by molecular
analysis in cells or recapitulation of disease in an animal model
Mutation Detection (1)

Sequencing technology is so good now that the quickest strategy is often to sequence the entire
few Mb that the disease-causing mutation has been mapped to and look for differences from
reference human sequence,
Modern chip-based assays can identify small deletions and duplicated regions of the genome
very quickly,
Identification of the PAX6 gene in Aniridia
Gene Hunting 11
Eye condition “absence of iris” is caused by the heterozygous mutation in the PAX6 gene,
expressed in the iris, cornea, leans and retina.
Study 1:
LOD Scores:
For catalase gene and unknown disease gene (AN2 locus): Z=7.27 at θ=0.00,
For microsatellite D11S151 and unknown disease gene (AN2 locus): Z=3.86, at θ=0.10,

Also a strong linkage with FSHB gene and DS11S16,
Enabled identification of candidate region within which the aniridia gene was presumed to
lie.
Study 2:
‘WAGR’ Syndrome: Wilm’s Tumour, Aniridia, Genito-Urinary Abnormalities, Mental

Retardation,
Associated with 11p13 deletions,
Wilm’s tumour is a genetic childhood kidney cancer, Can get wilm’s tumour without aniridia
and vice-verse, 2 genes.
Chromosome Jumping: Screened libraries to clone random bits of genomic DNA in the
candidate region
Got random clones of gDNA from candidate region and looked for ones with CpG islands
and were highly conserved between different species - Potential Genes,
Took gDNA from potential genes and screened an embryonic cDNA library to see which
clones represent the expressed sequence,
One clone lit up - encoded a paired domain-containing DNA - binding transcription factor,
PAX6,
Genomic DNA from WAGR and aniridia (non-WAGR) families, did not have this big of
gDNA, meaning that PAX6 was missing.
Deleted aniridia gene, PAX6, is expressed in the eye during development.
Study 3:
Gene Hunting 12
The mouse small eye mutation, that has aniridia, cataracts etc. is also a mutation in Pax6,
i.e. Knock out the gene in an animal model - replicates the human disease,
Mutation Detection (2)
Deletions, insertions, or re-arrangements can be detected by restriction enzyme digestion, gel

electrophoresis, southern blotting and probing the candidate gene, or by PCR of regions of the
candidate gene,
These technologies have been used extensively to find the mutations causing many diseases,
They are often used now in clinical diagnostic capacity to screen for mutations in known disease-
associated genes,
PCR-Based Screen in CFTR gene:
The 3bp deletion (the ΔF508) mutation that causes CF in many patients,
PCR using forward and reverse primers that span the mutation, get 98bp product in normal
allele 95bp product from the ΔF508 allele.
Detecting Small Mutations

Until the human genome was sequenced, very small changes such as single base changes or
insertions/deletions of 2-3bp were hard to detect,
Sequencing a gene used to be quite a challenge,

There are however some shortcuts that can be used to avoid direct sequencing in the first
instance,
Single-Strand Conformation Polymorphism (SSCP)

In SSCP, a DNA fragment is heated to denature the strands, then cooled rapidly on ice,
The single DNA strands will fold up on themselves to form secondary structures (e.g. hairpins),
A mutant allele, even with only 1bp change, may form a different secondary structure from the
wild-type, and this may influence mobility of the fragment during non-denaturing acrylamide gel
electrophoresis,
A difference in mobility relative to a normal control fragment indicates a mutation,
Gene Hunting 13
Detection of Copy Number Variation (CNV)
Diseases may be caused by deletions or duplications that increase or decrease the number of
copies of the disease gene,
Extreme Example - Down Syndrome,
Subtle Example - Williams Syndrome, Deletions + Duplications,
Chip, next-gen sequencing or qPCR-based techniques for detecting changes in gene copy
number,
Validation
When you think you have found the disease-causing mutation, how do you know if is pathogenic
and not just harmless genetic variation?
1. Does it inactivate a gene,
2. If you have a pedigree, check that the disease allele is carried by all affected individuals,
3. Recreate the mutation in wild-type cells or models and show that it causes problems that
explain the human disease state (e.g. CFTR mouse above)
DISC1 + Schizophrenia
Schizophrenia: a mental illness with strong evidence for genetic component,
Lab in Scotland studies a Scottish family with high incidence of schizophrenia and related
mental disorders,
They had a balanced chromosome 1/11 translocation that segregated with disease,
Maximum LOD score of 6, suggested the breakpoint was very closely linked to the disease gene,
The breakpoint falls a region of chromosome 11 that has no genes (gene desert), but falls with
and intron of a (then) novel gene in chromosome 1, that they called ‘Disrupted in Schizophrenia
1’ (DISC1),
Together, the fact that the breakpoints cut DISC1 in two (therefore inactivating it) and the
linkage analysis showing that the breakpoint is likely to be the pathological mutation, indicate
that the mutation in DISC1 causes schizophrenia,
Gene Hunting 14
Generate a DISC1 knock out in mice, the mice were viable and fertile (like human schizophrenia
patient),
The mice developed abnormal emotional behaviour as assessed by the elevated plus-maze and
cliff-avoidance tests, therefore suggesting that a deficiency of full-length DISC1 may result in
lower anxiety and/or higher impulsivity,
Goes some way to validating the DISC1 mutations as underlying schizophrenia.
A Worst Case Scenario
A small (e.g. point) mutation in non-coding DNA that underlies a human disease,
May for example mutate a transcription factor binding sequence in a gene regulatory element,
From 1000+ genome projects, there are 1000000s of harmless neutral SNPs, if you find a new
SNP close to an identified disease gene which has no mutation in its coding region, is it affecting
gene expression, or is it a neural polymorphism?
Identification of Gene Regulatory Sequences
1. If a sequence is important, it is probably evolutionarily conserved,

Alignment of genomic sequences from different animals allows us to see the bits of non-
coding DNA, outside genes, that have been conserved over evolutionary time,
Allows you to infer presence of DNA regulatory regions,
e.g. UCSC genome browser, ECR browser,
2. Does your new SNP disrupt the consensus binding sequence of a transcription factor,
rVista will predict whether a particular DNA sequence is likely to be bound by any of a
panel of TFs for which it knows the consensus DNA binding sequence,
Direct Assays of Enhancer Function
If your SNP is in a putative evolutionary conserved gene enhancer region, check if it affects the
ability of the enhancer to drive gene expression,
Direct in vitro or in vivo assays for DNA-protein binding (ChIP, SW blot),
Gene Hunting 15
Make reporter DNA constructs with with-type and mutant alleles of the enhancer.
Gene Hunting 16

Gene Hunting

Uploaded by

Copyright:

Available Formats

Gene Hunting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gene Hunting

Uploaded by

Copyright:

Available Formats

Gene Hunting

Candidate Gene Approach

Determine the location of disease gene, by linkage analysis or genome-wide studies.

Positional closing can be achieved by linkage analysis (genetic mapping) or association

Linkage and Recombination

Genome-Wide Scans for Disease Genes

The markers are typically ‘Microsatellites’ or ‘Single Nucleotide Polymorphisms’

θ is between 0 and 0.5 (0 and 50%)

θ = 0.5, for unlinked genes (very far apart, or on different chromosomes),

Polymorphism in Human DNA

Commonly used markers include Single Nucleotide Polymorphisms (SNPs) and

A:G:C:T is not 1:1:1:1, so different density from non-repeat DNA.

Tandem repeats of short 10-15bp sequences, with a ‘core’ sequence of ‘GGGCAGGANG’,

Autosomal Dominant Disease,

In Inbred families, there will

Can trace this haplotype

Single Nucleotide Polymorphisms (SNPs)

These can be in our out with our genes

Microsatellites are easy to assay by PCR.

SNPs have to either be directly sequences or analysed by techniques such as Hybridisation

LOD Scores (Log of the Odds)

A mathematical procedure, implemented by computer software, is used to generate these

Logarithm (base 10) of this ratio if the LOD Score (Z).

LOD(θ) = Log10 [L θ/L(0.5)]

So LOD score 3 gives overall probability that loci are unlinked of 20 to 1,

P = 0.5, 95% chance is genuine linkage.

LOD Calculation Example

Autosomal Dominant Pedigree,

Find the value of Z for proposed θ = 0.5,

LOD score for the family at θ = 0 is log10(1)4 /(0.5)4 ﻿i.e. = 12

Multipoint Genetic Markers

The highest peak is the most likely location of the gene.

Works because SNPs are inherited as part of a haplotype:

I.e. particular alleles of closely linked SNPs tend to be inherited together,

Association (a function of alleles) is not the same as linkage (function of a loci),

Proving its the Right Gene

To prove you have the right gene, you need:

Mutation Detection (1)

Identification of the PAX6 gene in Aniridia

Also a strong linkage with FSHB gene and DS11S16,

‘WAGR’ Syndrome: Wilm’s Tumour, Aniridia, Genito-Urinary Abnormalities, Mental

Associated with 11p13 deletions,

Deleted aniridia gene, PAX6, is expressed in the eye during development.

Mutation Detection (2)

Deletions, insertions, or re-arrangements can be detected by restriction enzyme digestion, gel

PCR-Based Screen in CFTR gene:

Detecting Small Mutations

Sequencing a gene used to be quite a challenge,

Single-Strand Conformation Polymorphism (SSCP)

A difference in mobility relative to a normal control fragment indicates a mutation,

Extreme Example - Down Syndrome,

Subtle Example - Williams Syndrome, Deletions + Duplications,

1. Does it inactivate a gene,

Goes some way to validating the DISC1 mutations as underlying schizophrenia.

A Worst Case Scenario

Identification of Gene Regulatory Sequences

1. If a sequence is important, it is probably evolutionarily conserved,

Allows you to infer presence of DNA regulatory regions,

e.g. UCSC genome browser, ECR browser,

LOD score for the family at θ = 0 is log10(1)4 /(0.5)4 i.e. = 12