Targeted Resequencing of Candidate Genes Using Selector Probes
Targeted Resequencing of Candidate Genes Using Selector Probes
Targeted Resequencing of Candidate Genes Using Selector Probes
2 e8
doi:10.1093/nar/gkq1005
Received July 21, 2010; Revised September 22, 2010; Accepted October 7, 2010
*To whom correspondence should be addressed. Tel: +46 18 471 48 16; Fax: +46 18 471 48 08; Email: mats.nilsson@genpat.uu.se
Correspondence may also be addressed to Olle Ericsson. Tel: +46 18 495 31 22; Fax: +46 18 495 31 21; Email: olle.ericsson@olinkgenomics.com
The authors wish it to be known that, in their opinion, the first two authors and last two authors should be regarded as joint First Authors.
general targeting of fragments, hybridization can be Swedish Biobank Legislation and Ethical Review Act
combined with an enzymatic discrimination and ampli- (reference 2006/325, Ethical review board in Uppsala).
fication step, a strategy which is successfully demon- The tissue samples were reviewed by a pathologist and
strated in PCR (20). Since it is difficult to perform highly only tumor tissues with a tumor cell fraction >50%
multiplex PCR reactions with high-success rate, massive were included. DNA was extracted from FFPE and
numbers of single-locus PCRs have to be applied, requiring frozen tissue sections using the QIAamp DNA Mini Kit
compartmentalization in small volumes to be cost- (Qiagen, Hamburg, Germany).
effective. This can be achieved by array based approaches
(9) or by emulsion PCR (10), both requiring sophisticated Design and oligonucleotides
liquid handling and amplification systems.
Two approaches have been presented for efficient Twenty-eight genes (Table 1) known to be mutated in lung
single-tube massively multiplexed amplification of and/or colon cancer were chosen for targeted
genomic loci. Both methods are based on ligase-assisted resequencing. A list of coding regions for each gene was
Preparation of tumor samples The last column shows the percentage of the ROI base pairs that were
covered in the design.
Matched fresh-frozen tumor/benign lung tissues and a
CCDS 14 April 2009.
b
FFPE tumor tissue were used in accordance with the No CCDS available, CDS used, 14 April 2009.
PAGE 3 OF 13 Nucleic Acids Research, 2011, Vol. 39, No. 2 e8
base to avoid the risk of fragment dropout due to muta- Following this step, to further remove non-specifically
tions in the restriction enzyme or probe-binding site. The bound DNA, the beads were washed in 1 M NaCl,
design achieved 99% coverage of targeted bases and the 10 mM Tris–HCl (pH 7.5), 5 mM EDTA and 0.1%
missing bases were found to be in or near repetitive Tween-20 in a total volume of 200 ml at 46 C for 30 min
elements. Selected restriction fragments are displayed with rotation.
in the supplementary gff-file. Selector probes serving
as template for circularization of each chosen fragment Circularization of targeted fragments
were designed using the ProbeMaker software (24). Each
selector consists of two Tm balanced sequences of 20–25 nt To circularize the genomic fragments, the beads were
complementary to the ends of its targeted restriction incubated in 1 Ampligase reaction buffer, 0.25 U/ml
fragment. Ampligase (Epicentre) and 0.1 mg/ml BSA in a total
The oligonucleotides (Integrated DNA Technologies) volume of 50 ml. The ligation mix was incubated at 55 C
hybridizing to 25 bps on each end of the targeted for 10 min.
Table 2. Sequencing results for each sample showing the total number of reads obtained for each sample and the percentage of them that align
to the human genome build hg18, and the percentage of the hg18 uniquely aligned reads that aligns to the specified region
Manual (P/N 701930 Rev2.), Affymetrix Inc., Santa Analysis of input DNA amount requirement
Clara, CA, USA) and the arrays were scanned using the A titration of different amount of DNA from 100 to
GeneChipÕ Scanner 3000 7G. Normal samples analyzed 1600 ng DNA (combined for all eight restriction diges-
at Uppsala Array Platform were used as a reference set to tions) was subjected to our enrichment protocol. The
produce log ratios. The rank segmentation algorithm, amplified DNA was sent to GATC Biotech (Constance,
similar to the Circular Binary Segmentation (CBS) algo- Germany) for Illumina GAII sequencing. The 32-mer
rithm (25), in the software Nexus from Biodiscovery was reads were mapped to our amplified region as described
used to segment the data across the genome. The signifi- above, but only allowing for four mismatches (-mm 4).
cance threshold for segmentation was set at 1 106, also The specificity was calculated as the fraction of unique
requiring a minimum of 40 probes per segment. The reads that mapped to our amplified region.
results are shown as segmented log2 ratio in Figure 4a.
enrichment factor [=(sequenced bp in ROI/sequenced bp Table 3. Specificity values obtained for experiments with different
off ROI)/(size of ROI/size of the genome)] of 200 000 amount of input DNA
(enrichment statistics for all samples are summarized in
Input DNA (ng) Specificitya (%)
Table 2). Some amplified fragments will also contain
adjacent off ROI sequence. With a strict coding-sequence 1600 96.0
target definition, 68% of the reads mapped on target, and 800 91.0
83% on target ±50 bp. The reproducibility in coverage 400 83.9
200 75.0
between samples is illustrated in Figure 1b and was 100 66.0
calculated to 0.98 (R2, linear regression). Furthermore,
98% of the targeted bases were covered at >10% of the a
Percentage of the uniquely aligned reads to hg18 that aligned uniquely
mean base coverage (mean coverage = 273, Figure 1c and to the region specified.
Supplementary Figure S1). To investigate how much input
DNA is required for the method, we performed
enrichment experiments on different amounts of It is of great importance that an enrichment tech-
template DNA (Table 3). We observed that the specificity nique introduces minimal distortion of the original
decreased when <800-ng DNA was added to the reaction, allele ratios in the analyzed sample. To assess the
and therefore we chose not to use less DNA than 800 ng ability of the presented Selector protocol to preserve
per sample in this study. original allele ratios, we sequenced a HapMap trio
e8 Nucleic Acids Research, 2011, Vol. 39, No. 2 PAGE 6 OF 13
(NA18506, NA18506 and NA18508) and compared tumor and matched control samples of two breast-cancer
our calls with results made available within the HapMap cell lines (HCC1143 and HCC1599). By comparing the
project (26). Reference allele-frequency analysis was variant calls from the tumor and the matched normal
performed using MosaikAligner and the result was tissue, it is possible to distinguish germline variants from
compared to the genotypes of the 164 available SNPs in somatic mutations. The results from comparing allele
the target sequence in the HapMap database which ratios are plotted in the lower panels of Figure 3a and b
overlapped with the ROI (Figure 2 and Table 4). The and the calls are summarized in Table 5. The difference in
concordance was 100% for covered SNPs within ROI allele ratio between the tumor and normal cell line was
and 99% including targeted SNPs outside of ROI >0.3 in 14 ROI positions in HCC1143, and 7 ROI positions
(in total 383). in HCC1599. The differences in HCC1143 were due
To investigate the utility of this enrichment technique to loss-of-heterozygosity (LOH) in eight genes, caused by
for analysis of somatic mutations in cancer, we sequenced allelic amplification (CCND1), deletion [MET, CDKN2A,
MRE11A, ATM, NF1 and ERBB2 (HER2)] and copy-
Table 4. SNP concordance with data available from three HapMap samples (NA18506, NA18507 and NA18508) at two coverage thresholds:
covered at least once and covered at least 20 times
Sample Relation Coverage Region Number Covered (%) Homozygoteb Heterozygote Concordance (%)
of SNPsa
Selectors Hapmap Selectors Hapmap
Positions are considered as heterozygote if the reference allele ratio is between 0.35 and 0.85.
a
Number of HapMap SNPs overlaping with region. NN positions omitted.
b
Reference allele ratio under 0.35 or above 0.85.
to junctions of the circularized fragments, we were able difficult to assess whether MDA introduce less or more
to retrieve information about the performance of indi- amplification bias compared to PCR, because of
vidual selectors. We observed worse representation the probe redundancy and since different sequencing
of the longer fragments than the shorter in the FFPE platforms were used to sequence the PCR amplicons
sample compared to the fresh frozen (Figure 5), resulting (454, long read) and the MDA products (SOLiD, short
in more uneven amplification (Figure 1c). However, read). In contrast to PCR, the MDA amplification gener-
the difference in coverage between the shortest and the ates a double-stranded high-molecular weight amplifi-
longest fragments in the range 100–300 bp was less than cation product that is very similar in nature to genomic
a factor of two, so with a different probe design targeting DNA. We have demonstrated compatibility with the
shorter fragments, evenness of coverage should be standard sample preparation procedures for both SOLiD
improved. and Illumina short-read instruments. In contrast, to be
compatible with the short-read-sequencing platforms, a
PCR product requires concatemerization by ligation,
DISCUSSION followed by fragmentation, which is a more complicated
In this article, we report on a method for targeted enrich- scheme. For emulsion PCR, an additional fragmentation
ment of a relatively small subset of genes that produces step is also required before the PCR resulting in a very
the targeted regions in an unbiased fashion with very low laborious procedure. The MDA also avoids clonal propa-
amount of off-target material. For a small set of samples, gation of polymerization artifacts by employing rolling
it would be possible to sequence the same region with circle amplification, a feature valuable for detection of
PCR and Sanger sequencing with an estimated 384-well rare variants.
plate of reactions for each sample. We are however In addition to the importance of maintaining high
convinced that the proposed scheme is more practical specificity, only enriching regions that are targeted, it is
also for small collections of samples. also important to minimize the number of near-target base
Compared to the previously published version of pairs that are amplified. The restriction enzyme-based
the selector technology (13,29), the current protocol approach of the selector technology provides very high-
uses MDA instead of PCR. By eliminating the PCR on-target rate by design, between 50–70% of the
primers in the probes, the oligonucleotide length, and amplicons are on exons in the majority of designs (30).
thereby also cost, is significantly reduced. Probe cost This is comparable and even slightly better than published
has further been reduced through the introduction of work using parallel PCR on the Raindance platform (10).
a solid-phase purification step to remove excess probe In this article exons with an average size of 167 bp where
molecules, replacing the enzymatic degradation used in targeted with amplicons of between 300 and 600 bp, which
the previous protocol, requiring expensive uracil residues will generate a significant proportion of near-exon
in the probes. The invasive cleavage that was used to sequence. The selector amplicons range between 100 and
generate the majority of fragments in the previous 1000 and can more easily be adjusted to fit the target
protocol was avoided in this version, since we observed length when selecting from eight different restriction
worse bias when it was used (data not shown). Thus enzyme reactions (30).
in this study, we only used restriction digestion to Multiplex targeted sequencing is of particular interest
generate the fragments. We have further increased for molecular cancer diagnostics as genetic aberra-
design redundancy, which should improve coverage. It is tions in growth-factor-signaling pathways can be decisive
e8 Nucleic Acids Research, 2011, Vol. 39, No. 2 PAGE 8 OF 13
Sample Chr (hg18) Position Gene Reference Normal Tumor T.Ratio Call Location
(hg18) sequence N.Ratioa
Ratio A G C T Ratio A G C T
PAGE 9 OF 13
HCC1143 NC_000005.8 112190753 APC T 0.38 0 0 52 32 1.00 0 0 0 78 0.62 LOH, CNN ROIb
HCC1143 NC_000005.8 112203669 APC G 0.57 65 85 0 0 1.00 0 160 0 0 0.43 LOH, CNN ROIb
HCC1143 NC_000005.8 112204224 APC G 0.61 189 292 0 0 1.00 3 597 0 0 0.39 LOH, CNN ROIb
HCC1143 NC_000005.8 112205070 APC G 0.61 43 68 0 0 1.00 0 144 0 0 0.39 LOH, CNN ROIb
HCC1143 NC_000007.12 116126908 MET C 0.62 0 0 114 69 0.00 0 0 0 43 0.62 LOH ROIb
HCC1143 NC_000007.12 116127498 MET A 0.74 62 22 0 0 0.08 2 21 0 0 0.65 LOH ROIb
HCC1143 NC_000007.12 116223258 MET G 0.57 116 152 0 0 0.99 1 114 0 0 0.42 LOH ROIb
HCC1143 NC_000007.12 116223333 MET G 0.59 75 136 0 0 1.00 0 89 0 0 0.41 LOH ROIb
HCC1143 NC_000009.10 21961039 CDKN2A G 1.00 0 28 0 0 0.00 0 0 0 0 1 ROIb
HCC1143 NC_000011.8 69172091 CCND1 G 0.49 70 68 0 0 0.08 531 146 0 0 0.41 LOH, AMP ROIb
HCC1143 NC_000011.8 93865568 MRE11A C 0.63 0 0 20 12 1.00 0 0 27 0 0.38 LOH ROIb
HCC1143 NC_000011.8 107668697 ATM C 0.66 0 0 193 100 0.05 0 0 5 87 0.60 LOH ROIb
HCC1143 NC_000017.9 26577611 NF1 G 0.7 12 28 0 0 0.00 14 0 0 0 0.70 LOH ROIb
HCC1143 NC_000017.9 35137563 HER2 C 0.75 0 6 18 0 0.00 0 27 0 0 0.75 LOH ROIb
HCC1599 NC_000005.8 112182868 APC C 0.68 0 2 393 179 0.15 0 0 32 179 0.53 LOH ROIb
HCC1599 NC_000005.8 112203516 APC T 0.62 85 0 0 136 0.99 1 0 0 86 0.37 LOH ROIb
HCC1599 NC_000005.8 112206694 APC G 0.56 147 187 1 0 0.03 117 4 0 0 0.53 LOH ROIb
HCC1599 NC_000007.12 55181842 EGFR C 0.67 1 0 177 87 0.99 0 0 155 2 0.32 LOH ROIb
HCC1599 NC_000017.9 7520197 TP53 G 0.57 0 26 19 1 1.00 0 10 0 0 0.43 LOH ROIb
HCC1599 NC_000017.9 26610220 NF1 A 0.48 29 31 0 0 0.85 23 4 0 0 0.37 LOH ROIb
HCC1599 NC_000019.8 35006506 CCNE1 C 0.65 0 0 108 57 0.16 0 0 75 401 0.50 AMP ROIb
HCC1143 NC_000002.10 212252269 HER4 G 0.52 14 15 0 0 0.21 19 5 0 0 0.31 Amp. Regionc
HCC1143 NC_000007.12 116185999 MET C 0.62 0 0 34 21 0.05 0 0 1 20 0.57 LOH Amp. Regionc
HCC1143 NC_000011.8 93851696 MRE11A C 0.69 0 0 27 12 1.00 0 0 23 0 0.31 LOH Amp. Regionc
HCC1143 NC_000011.8 93865455 MRE11A C 0.65 0 0 28 15 0.96 0 0 23 1 0.31 LOH Amp. Regionc
HCC1143 NC_000012.10 54763961 HER3 A 0.58 131 0 0 96 0.00 0 0 0 122 0.58 LOH Amp. Regionc
HCC1143 NC_000012.10 54764761 HER3 T 0.97 0 0 1 32 0.62 0 0 3 5 0.34 Amp. Regionc
HCC1143 NC_000012.10 54766850 HER3 G 0.46 21 18 0 0 0.00 22 0 0 0 0.46 LOH Amp. Regionc
HCC1143 NC_000012.10 54768447 HER3 T 0.68 0 15 0 32 0.06 0 16 0 1 0.62 LOH Amp. Regionc
HCC1143 NC_000012.10 54778054 HER3 A 1.00 195 0 0 0 0.05 3 60 0 0 0.95 A ! G Amp. Regionc
HCC1143 NC_000017.9 26510278 NF1 G 0.48 24 22 0 0 1.00 0 49 0 0 0.52 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 26584058 NF1 C 0.65 32 0 59 0 0.10 47 0 5 0 0.55 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 26584383 NF1 G 0.64 11 21 0 0 1.00 0 40 0 0 0.36 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 26677419 NF1 T 0.7 0 0 25 57 1.00 0 0 0 96 0.30 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 26679002 NF1 T 0.6 81 4 0 129 1.00 0 0 0 225 0.40 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 35119531 HER2 C 0.53 0 0 75 66 0.00 0 0 0 145 0.53 LOH, CNN Amp. Regionc
HCC1143 NC_000017.9 35122241 HER2 C 0.6 0 0 36 24 0.02 0 0 1 43 0.58 LOH, CNN Amp. Regionc
HCC1599 NC_000001.9 241867813 AKT3 T 0.67 0 0 10 20 0.33 0 0 18 9 0.33 Amp. Regionc
HCC1599 NC_000002.10 212134711 HER4 A 0.59 24 0 17 0 0.13 1 0 7 0 0.46 LOH Amp. Regionc
HCC1599 NC_000002.10 212192049 HER4 T 0.67 0 0 7 14 1.00 0 0 0 2 0.33 Amp. Regionc
HCC1599 NC_000002.10 212252169 HER4 A 0.56 84 65 0 0 0.14 8 48 0 0 0.42 LOH Amp. Regionc
HCC1599 NC_000002.10 212252269 HER4 G 0.71 10 24 0 0 0.00 10 0 0 0 0.71 LOH Amp. Regionc
HCC1599 NC_000002.10 212323573 HER4 C 0.67 0 0 250 122 0.98 0 0 208 4 0.31 LOH Amp. Regionc
HCC1599 NC_000002.10 212323761 HER4 T 0.63 0 0 39 66 0.29 0 0 25 10 0.34 LOH Amp. Regionc
HCC1599 NC_000004.10 153464702 FBXW7 C 0.43 0 12 9 0 0.00 0 21 0 0 0.43 LOH Amp. Regionc
HCC1599 NC_000004.10 153551830 FBXW7 T 0.61 0 11 0 17 1.00 0 0 0 26 0.39 LOH Amp. Regionc
HCC1599 NC_000007.12 55182141 EGFR G 0.45 0 14 0 17 1.00 0 8 0 0 0.55 LOH Amp. Regionc
HCC1599 NC_000007.12 55187671 EGFR A 0.66 61 32 0 0 0.97 57 2 0 0 0.31 LOH Amp. Regionc
Nucleic Acids Research, 2011, Vol. 39, No. 2
HCC1599 NC_000007.12 55196682 EGFR G 0.55 94 116 0 0 1.00 0 97 0 0 0.45 LOH Amp. Regionc
(continued)
e8
Sample Chr (hg18) Position Gene Reference Normal Tumor T.Ratio Call Location
(hg18) sequence N.Ratioa
Ratio A G C T Ratio A G C T
HCC1599 NC_000007.12 55227257 EGFR A 0.66 78 41 0 0 0.03 1 28 0 0 0.62 LOH Amp. Regionc
HCC1599 NC_000007.12 55240320 EGFR G 0.77 17 56 0 0 0.00 9 0 0 0 0.77 LOH Amp. Regionc
HCC1599 NC_000012.10 25269723 KRAS T 0.76 0 0 6 19 0.00 0 0 4 0 0.76 Amp. Regionc
HCC1599 NC_000012.10 54764729 HER3 A 1.00 20 0 0 0 0.67 2 1 0 0 0.33 Amp. Regionc
HCC1599 NC_000012.10 54764863 HER3 T 1.00 0 0 0 25 0.60 0 0 2 3 0.40 Amp. Regionc
HCC1599 NC_000012.10 54766915 HER3 G 0.7 17 40 0 0 0.33 24 12 0 0 0.37 Amp. Regionc
e8 Nucleic Acids Research, 2011, Vol. 39, No. 2
The number of respective base call is shown for both normal and tumor sample, the reference allele ratios, the reference allele ratio and our interpretation for the probable cause for the ratio
difference are listed for all positions.
a
Only base positions with an absolute ratio difference above 0.3 between tumor and normal are listed.
b
The position is located within the region of interest.
c
The position is located within the amplified region but not in the region of interest.
PAGE 10 OF 13
Figure 4. (a) Mutation detection and CNV analysis of a lung cancer patient sample with matched control. Upper panel: for each position in the
targeted region the number of bases called in the patient-matched normal tissue is subtracted from the number of bases called in the tumor-derived
DNA, and then divided with the sum of called bases for that position in the two samples. The exons of the 28 genes are lined up after each other and
genes are demarked by alternating background color. Middle panel: the inferred gene copy-number variation in the corresponding genomic loci
illustrated by log2 ratios (pink line) derived from SNP array data (Affymetrix Gene Chip Mapping 250K arrays). Middle panel: the log2 ratio (pink
line) of the copy-number analysis done on an Affymetrix micro array. Lower panel: the allelic ratio between the major and minor allele at each
position is compared between the two samples by subtraction (b). A correlation plot between the Affymetrix Gene Chip log2 tumor/normal signal
ratio and the log2 tumor/normal sequencing read depth ratio. (c) Detection of a single base pair deletion in the TP53 gene. Forward (brown) and
reverse (blue) reads are aligned to a 15-bp region of the TP53 gene. Deleted bases are indicated by dashed lines. Alignment visualized in Integrative
Genomic Viewer (IGV ver.1.4.2). (d) Detection of the same mutation in the FFPE sample from the same tumor.
e8 Nucleic Acids Research, 2011, Vol. 39, No. 2 PAGE 12 OF 13
22. Dean,F.B., Hosono,S., Fang,L., Wu,X., Faruqi,A.F., Bray- 27. Sjöblom,T., Jones,S., Wood,L.D., Parsons,D.W., Lin,J.,
Ward,P., Sun,Z., Zong,Q., Du,Y., Du,J. et al. (2002) Barber,T.D., Mandelker,D., Leary,R.J., Ptak,J., Silliman,N. et al.
Comprehensive human genome amplification using multiple (2006) The consensus coding sequences of human breast and
displacement amplification. Proc. Natl Acad. Sci. USA, 99, colorectal cancers. Science, 314, 268–274.
5261–5266. 28. Sanger,F. and Coulson,A.R. (1975) A rapid method for
23. Stenberg,J., Zhang,M. and Ji,H. (2009) Disperse–a software determining sequences in DNA by primed synthesis with DNA
system for design of selector probes for exon resequencing polymerase. J. Mol. Biol., 94, 441–448.
applications. Bioinformatics, 25, 666–667. 29. Dahl,F., Gullberg,M., Stenberg,J., Landegren,U. and Nilsson,M.
24. Stenberg,J., Nilsson,M. and Landegren,U. (2005) ProbeMaker: an (2005) Multiplex amplification enabled by selective circularization
extensible framework for design of sets of oligonucleotide probes. of large sets of genomic DNA fragments. Nucleic Acids Res., 33,
BMC Bioinformatics, 6, 229. e71.
25. Olshen,A.B., Venkatraman,E.S., Lucito,R. and Wigler,M. (2004) 30. Stenberg,J., Dahl,F., Landegren,U. and Nilsson,M. (2005)
Circular binary segmentation for the analysis of array-based PieceMaker: selection of DNA fragments for selector-guided
DNA copy number data. Biostatistics, 5, 557–572. multiplex amplification. Nucleic Acids Res., 33, e72.
26. Frazer,K.A., Ballinger,D.G., Cox,D.R., Hinds,D.A., Stuve,L.L.,
Gibbs,R.A., Belmont,J.W., Boudreau,A., Hardenbol,P., Leal,S.M.