Abstract
As a base for human transcriptome and functional genomics, we created the “full-length long Japan” (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at ∼58% compared with a peak at ∼42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at ∼42%, relatively low compared with that of protein-coding cDNAs.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout


Similar content being viewed by others
References
Hattori, M. et al. The DNA sequence of human chromosome 21. Nature 405, 311–319 (2000).
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Deloukas, P. et al. The DNA sequence and comparative analysis of human chromosome 20. Nature 414, 865–871 (2001).
Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001).
Boguski, M.S. The turning point in genome research. Trends Biochem. Sci. 20, 295–296 (1995).
Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).
Suzuki, Y., Yoshitomo, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library. Gene 200, 149–156 (1997).
Nomura, N. et al. Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1. DNA Res. 1, 27–35 (1994).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Giesecke, H., Obermaier, B., Domdey, H. & Neubert, W.J. Rapid sequencing of the Sendai virus 6.8 kb large (L) gene through primer walking with an automated DNA sequencer. J. Virol. Methods. 38, 47–60 (1992).
Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
Fickett, J.W. Predictive methods using nucleotide sequences. Methods Biochem. Anal. 39, 231–245 (1998).
Huttenhofer, A. et al. RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 20, 2943–2953 (2001).
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).
Burset, M. & Guigo, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).
Rogic, S., Mackworth, A.K. & Ouellette, F.B. Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11, 817–832 (2001).
Yudate, H.T. et al. HUNT: launch of a full-length cDNA database from the helix research institute. Nucleic Acids Res. 29, 185–188 (2001).
Hattori, A. et al. Characterization of long cDNA clones from human adult spleen. DNA Res. 7, 1–11 (2001).
Bernardi, G. The isochore organization of the human genome and its evolutionary history—a review. Gene. 135, 57–66 (1993).
The FANTOM consortium and The RIKEN Genome Exploration Research Group Phase I & II team. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).
Wiemann, S. et al. Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. 11, 422–435 (2001).
Strausberg, R.L., Feingold, E.A., Klausner, R.D. & Collins, F.S. The mammalian gene collection. Science 286, 455–457 (1999).
Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).
Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).
Chance, M.R. et al. Structural genomics: a pipeline for providing structures for the biologist. Protein Sci. 11, 723–738 (2002).
Suzuki, Y, et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).
Acknowledgements
We thank A. Kishimoto, H. Ezoe and T. Matsuo for supporting the project and E. Nakajima for critically reading the manuscript. This project was supported by the Ministry of Economy Trade and Industry of Japan and also in part by Special Coordination Funds for Promoting Science and Technology from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Requests for materials should be addressed to S. Sugano. Requests for physical cDNA clones should be addressed to S.Sugano (flcdna@ims.u-tokyo.ac.jp) or T. Isogai (isogai-t@reprori.jp). For more information on each cDNA clone, visit FLJ-DB. For general information on the FLJ project, please refer to NEDO website.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Rights and permissions
About this article
Cite this article
Ota, T., Suzuki, Y., Nishikawa, T. et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet 36, 40–45 (2004). https://doi.org/10.1038/ng1285
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng1285
This article is cited by
-
The regulatory subunit MoB56 of PP2A phosphatase regulates pathogenicity, growth and development in a protein complex with the atypical catalytic subunit Ppg1 in the rice blast fungus Magnaporthe oryzae
Phytopathology Research (2023)
-
Genetic screening for anticancer genes highlights FBLN5 as a synthetic lethal partner of MYC
Cell Communication and Signaling (2023)
-
Protein N-myristoylation plays a critical role in the mitochondrial localization of human mitochondrial complex I accessory subunit NDUFB7
Scientific Reports (2023)
-
Transmembrane protein 117 knockdown protects against angiotensin-II-induced cardiac hypertrophy
Hypertension Research (2023)
-
TMIGD1: Emerging functions of a tumor supressor and adhesion receptor
Oncogene (2023)