Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Complete sequencing and characterization of 21,243 full-length human cDNAs

Abstract

As a base for human transcriptome and functional genomics, we created the “full-length long Japan” (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at 58% compared with a peak at 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at 42%, relatively low compared with that of protein-coding cDNAs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Flow chart of cDNA categorization.
Figure 2: GC contents of the FLJ cDNAs and the corresponding genomic regions to which they were mapped.

Similar content being viewed by others

References

  1. Hattori, M. et al. The DNA sequence of human chromosome 21. Nature 405, 311–319 (2000).

    Article  CAS  Google Scholar 

  2. Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

    Article  CAS  Google Scholar 

  3. Deloukas, P. et al. The DNA sequence and comparative analysis of human chromosome 20. Nature 414, 865–871 (2001).

    Article  CAS  Google Scholar 

  4. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  Google Scholar 

  5. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  Google Scholar 

  6. Pruitt, K.D. & Maglott, D.R. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 29, 137–140 (2001).

    Article  CAS  Google Scholar 

  7. Boguski, M.S. The turning point in genome research. Trends Biochem. Sci. 20, 295–296 (1995).

    Article  CAS  Google Scholar 

  8. Maruyama, K. & Sugano, S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene 138, 171–174 (1994).

    Article  CAS  Google Scholar 

  9. Suzuki, Y., Yoshitomo, K., Maruyama, K., Suyama, A. & Sugano, S. Construction and characterization of a full length-enriched and a 5′-end-enriched cDNA library. Gene 200, 149–156 (1997).

    Article  CAS  Google Scholar 

  10. Nomura, N. et al. Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1. DNA Res. 1, 27–35 (1994).

    Article  CAS  Google Scholar 

  11. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  12. Giesecke, H., Obermaier, B., Domdey, H. & Neubert, W.J. Rapid sequencing of the Sendai virus 6.8 kb large (L) gene through primer walking with an automated DNA sequencer. J. Virol. Methods. 38, 47–60 (1992).

    Article  CAS  Google Scholar 

  13. Ewing, B., Hillier, L., Wendl, M.C. & Green, P. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

    Article  CAS  Google Scholar 

  14. Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).

    Article  CAS  Google Scholar 

  15. Fickett, J.W. Predictive methods using nucleotide sequences. Methods Biochem. Anal. 39, 231–245 (1998).

    CAS  PubMed  Google Scholar 

  16. Huttenhofer, A. et al. RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 20, 2943–2953 (2001).

    Article  CAS  Google Scholar 

  17. Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).

    Article  CAS  Google Scholar 

  18. Burset, M. & Guigo, R. Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).

    Article  CAS  Google Scholar 

  19. Rogic, S., Mackworth, A.K. & Ouellette, F.B. Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11, 817–832 (2001).

    Article  CAS  Google Scholar 

  20. Yudate, H.T. et al. HUNT: launch of a full-length cDNA database from the helix research institute. Nucleic Acids Res. 29, 185–188 (2001).

    Article  CAS  Google Scholar 

  21. Hattori, A. et al. Characterization of long cDNA clones from human adult spleen. DNA Res. 7, 1–11 (2001).

    Google Scholar 

  22. Bernardi, G. The isochore organization of the human genome and its evolutionary history—a review. Gene. 135, 57–66 (1993).

    Article  CAS  Google Scholar 

  23. The FANTOM consortium and The RIKEN Genome Exploration Research Group Phase I & II team. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature 420, 563–573 (2002).

  24. Wiemann, S. et al. Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. Genome Res. 11, 422–435 (2001).

    Article  CAS  Google Scholar 

  25. Strausberg, R.L., Feingold, E.A., Klausner, R.D. & Collins, F.S. The mammalian gene collection. Science 286, 455–457 (1999).

    Article  CAS  Google Scholar 

  26. Gavin, A.C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

    Article  CAS  Google Scholar 

  27. Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

    Article  CAS  Google Scholar 

  28. Chance, M.R. et al. Structural genomics: a pipeline for providing structures for the biologist. Protein Sci. 11, 723–738 (2002).

    Article  CAS  Google Scholar 

  29. Suzuki, Y, et al. Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep. 2, 388–393 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank A. Kishimoto, H. Ezoe and T. Matsuo for supporting the project and E. Nakajima for critically reading the manuscript. This project was supported by the Ministry of Economy Trade and Industry of Japan and also in part by Special Coordination Funds for Promoting Science and Technology from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Requests for materials should be addressed to S. Sugano. Requests for physical cDNA clones should be addressed to S.Sugano (flcdna@ims.u-tokyo.ac.jp) or T. Isogai (isogai-t@reprori.jp). For more information on each cDNA clone, visit FLJ-DB. For general information on the FLJ project, please refer to NEDO website.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sumio Sugano.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ota, T., Suzuki, Y., Nishikawa, T. et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet 36, 40–45 (2004). https://doi.org/10.1038/ng1285

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng1285

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing