NGS QC Metrics
NGS QC Metrics
NGS QC Metrics
https://blog.horizondiscovery.com/diagnostics/the-5-ngs-qc-metrics-you-should-know
We all know how vital quality control is for our samples. A lot of research has gone in to developing useful QC metrics for genomics experiments –
primarily due to their high cost. Skipping this step will waste both time and money.
There are 3 main areas where QC can be applied to NGS:
Ultimately the best QC of your NGS experiments is likely to come from the sequence data and, because of this, many labs will run a QC lane first.
NGS QC metrics:
1. Sample Quality Control:
Usually you would start an NGS experiment with high-quality DNA and/or RNA samples. However, many experiments are performed with
degraded nucleic acids. Often you rely on spectrophotometric (Nanodrop), fluorimetric (Pico- and Ribo-Green) and gel electrophoretic
methods (Bioanalyser) to QC starting material. Agilent’s RNA Integrity Number (RIN) provides a robust and non-subjective method for RNA
QC, with most experiments using samples with RIN>7.
4.1. Yield:
Yield is the number of bases generated in the run. Yield is important to all users, but is usually something your service provider will
guarantee so you don’t need to worry about it.
4.3. %Q30:
The percentage of bases with a quality score of 30 or higher, respectively (see “Quality Scores Explained” below). Most Illumina runs will
generate >70-80% Q30 data. This value is an average across the whole read length, and error rate increases towards the end of the reads.
Because of this a run can “fail” at the end of a long-read, but pass Illumina’s specs for the run with respect to Q30 – if a read is Q40 for
bases 1-100, and Q10 for bases 101-150 it will pass the Q30 spec, but if you need the ends of the reads to be high quality, you may be
disappointed.
The Chastity filter works by calculating the ratio of the highest base intensity to the sum of the 1st and 2nd highest, anything less than 0.6 is
filtered out. If a cluster was formed from a single-molecule then the chastity score will be 1; if it were formed from two molecules the signal
would be equal and the chastity score will be 0.5.
There are several possible causes for poor phasing/pre-phasing, but to estimate this correctly requires a sample with balanced base-
composition (25% of each base), if you know your sample to be unbalanced then you may need to add extra PhiX control. Assuming your
sample is not the problem the most likely causes are the reagents or flowcell. Check the use-by date of reagents, check that there were no
problems with fluidics, and check that the temperature was not too high during the run.
5. Sequencing Reads:
If you are performing read-counting applications such as RNA-seq or ChIP-seq then this metric is likely to be more important than yield. The
two can be used interchangeably by sequencing service providers, so make sure you know which is most important to you.
Per Base Sequence Quality, which plots the Q-score of the raw sequence reads as a box-plot for each cycle. Higher is always better, and a
characteristic quality decay is seen in most runs.
Per Base Sequence content, which plots the proportion of each base at each cycle. In a random fragment library from a “normal” genome you
would expect to see all four bases equally represented. Deviation from normal base content can indicate issues with library quality, but equally
some genomes are very GC biased and some NGS applications also introduce a strong GC bias, e.g. Bis-seq.
Sequence quality should be high, generally above Q30, along the length of a good Illumina read. The profile has changed over time; there is a clear
decay of read quality towards the end of the read, but read-lengths of 150bp are possible on HiSeq and up to 300bp on MiSeq. The second read in
a paired-end run is always slightly lower quality than the first. Low-quality bases can be easily trimmed; lots of low quality bases may indicate a
poor library, or problems with phasing. However a sudden drop in quality is likely to indicate something happened during the run, or that there are
short fragments and you are reading into adapter. Tracking the quality profile can identify issues with sequencing chemistry and/or instruments,
and can help.
The Illumina technology produces the best data when all four bases are equally represented e.g. whole genome sequencing. This is due to several
reasons, all of which are to do with the analysis algorithms used to detect and call bases on the sequencer. As you are unlikely to be modifying
these the best suggestion is to monitor base composition (FastQC plot), and to understand when it is likely vary because of the library type being
sequenced e.g. RNA-seq, Nextera, or bisulfite-converted DNA.
It will also be important to consider other QC metrics after alignment and analysis. The more time and effort you spend on QC the better quality
your results and conclusion will be.
Find out how one clinical laboratory used our reference material to generate and monitor the quality control metrics in the case study below.
See also: