Factsheet: Eukaryotic Genome Annotation
Factsheet: Eukaryotic Genome Annotation
Factsheet: Eukaryotic Genome Annotation
Enhancing the value of assembled genomes through annotation using a standardized pipeline
https://www.ncbi.nlm.nih.gov/genome/annotation_euk
National Center for Biotechnology Information • National Library of Medicine • National Institutes of Health • Department of Health and Human Services
Overview
The NCBI Eukaryotic Genome Annotation Pipeline provides content for various NCBI resources including
the Nucleotide, Protein, and Gene databases, the BLAST sequence alignment services, and GDV, the Ge-
nome Data Viewer. The pipeline uses a modular framework for the execution of all annotation tasks from
the fetching of primary and curated data from public repositories (NCBI sequence and Assembly data-
bases) to the alignment of sequences, the prediction of genes, the generation of annotated genomic se-
quence records, and finally the submission of the accessioned annotation products to public databases.
Core components of the pipeline are alignment programs (Splign and ProSplign) and an HMM-based prediction program
(Gnomon) developed at NCBI. Important features of this annotation pipeline include:
• Flexibility and speed
• Reliance on experimental data, in particular RNA-Seq data
• Production of models that compensate for assembly issues
• Tracking of gene loci from one annotation to the next
Data Access
The products of an annotation run
(chromosomes, scaffolds and model tran-
scripts and proteins) are labeled with an
Annotation Release number. The Annota-
tion Release name is the combination of
the organism name and Annotation Re-
lease number (e.g. NCBI Papio anubis An-
notation Release 104) and is used through-
out NCBI as a way to uniquely identify an-
notation products originating from the same
annotation run.
Annotated Organisms
Only genomes with assemblies that are public in INSDC (DDBJ, ENA or GenBank) are considered for annotation by the
Eukaryotic Genome Annotation Pipeline. NCBI makes this selection based on several factors. These include:
• NIH/NCBI priorities
• Assembly quality www.ncbi.nlm.nih.gov/genome/annotation_euk/all/ (shown partially)
• Biological, evolutionary, or economic importance
• Public availability of supporting transcript evidence
• Community interest
See the full list of annotated organisms online (right).
Request an annotation at: go.usa.gov/xpsEJ