Abstract
In this paper we introduce the Weighted Suffix Tree, an efficient data structure for computing string regularities in weighted sequences of molecular data. Molecular Weighted Sequences can model important biological processes such as the DNA Assembly Process or the DNA-Protein Binding Process. Thus pattern matching or identification of repeated patterns, in biological weighted sequences is a very important procedure in the translation of gene expression and regulation. We present time and space efficient algorithms for constructing the weighted suffix tree and some applications of the proposed data structure to problems taken from the Molecular Biology area such as pattern matching, repeats discovery, discovery of the longest common subsequence of two weighted sequences and computation of covers.
Chapter PDF
Similar content being viewed by others
References
Apostolico, A., Farach, M., Iliopoulos, C.S.: Optimal superprimitivity testing for strings, Information Processing Letters, 39, (1991) 17–20.
Apostolico, A., Preparata, F.P.,: Optimal off-line detection of repetitions in a string. Theoretical Computer Science, Vol. 22. (1983) 297–315.
Brodal G.S., Lyngso R.B., Storm Pedersen C.N., and Stoye J.: Finding Maximal Pairs with Bounded Gap. In Proc. 10th CPM, pp. 134–149, (1999).
Celera Genomics: The Genome Sequence of Drosophila melanogaster. Science, Vol. 287. (2000) 2185–2195
Celera Genomics: The Sequence of the Human Genome. Science, Vol. 291, (2001) 1304–1351.
Crochemore, M.: An Optimal Algorithm for Computing the Repetitions in a Word. Inf. Proc. Lett., Vol. 12. (1981) 244–250.
G. Grillo, F. Licciuli, S. Liuni, E. Sbisa, G. Pesole PatSearch: a program for the detection of patterns and structural motifs in nucleotide sequences. Nucleic Acids Res. 31 (2003), 3608–3612.
Gusfield, D.: Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York (1997)
Iliopoulos, C., Makris, Ch, Panagis, I., Perdikuri, K., Theodoridis, E., Tsakalidis, A.: Computing the Repetitions in a Weighted Sequence using Weighted Suffix Trees, In European Conference on Computational Biology (ECCB0 2003), Posters’ Track.
Iliopoulos, C., Mouchard, L., Perdikuri, K., Tsakalidis, A.,: Computing the repetitions in a weighted sequence, Proceedings of the Prague Stringology Conference (PSC) 2003), 91–98.
Kolpakov, R., Kucherov, G.,: Finding maximal repetitions in a word in linear time. In Proc. FOCS99, pp. 596–604, (1999).
Kurtz, S., Schleiermacher, C.,: REPuter: fast computation of maximal repeats in complete genomes. Bioinformatics, Vol. 15, (1999) 426–427.
H. Li, V. Rhodius, C. Gross, E. Siggia Identification of the binding sites of regulatory proteins in bacterial genomes Genetics 99 (2002), 11772–11777.
McCreight, E., M.,: A space-economical suffix tree construction algorithm. J. of the ACM, Vol. 23, (1976) 262–272.
Stoye, J., Gusfield, D.,: Simple and flexible detection of contiguous repeats using a suffix tree. In Proc. 9th CPM, Vol. 1448 of LNCS, (1998) 140–152.
Tsunoda, T., Fukagawa, M., Takagi, T.,: Time and memory efficient algorithm for extracting palindromic and repetitive subsequences in nucleic acid sequences. Pacific Symposium on Biocomputing, Vol. 4, (1999) 202–213.
Ukkonen, E.,:On-line construction of suffix trees. Algorithmica, Vol. 14, (1995), 249–260.
van Emde Boas P., R. Kaas and E. Zijlstra, Design and implementation of an efficient priority queue. Mathematical Systems Theory, 10, pp. 99–127, (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science + Business Media, Inc.
About this paper
Cite this paper
Iliopoulos, C.S., Makris, C., Panagis, Y., Perdikuri, K., Theodoridis, E., Tsakalidis, A. (2004). Efficient Algorithms for Handling Molecular Weighted Sequences. In: Levy, JJ., Mayr, E.W., Mitchell, J.C. (eds) Exploring New Frontiers of Theoretical Informatics. IFIP International Federation for Information Processing, vol 155. Springer, Boston, MA. https://doi.org/10.1007/1-4020-8141-3_22
Download citation
DOI: https://doi.org/10.1007/1-4020-8141-3_22
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-8140-8
Online ISBN: 978-1-4020-8141-5
eBook Packages: Springer Book Archive