Abstract
We initiate the study of the smoothed complexity of sequence alignment, by proposing a semi-random model of edit distance between two input strings, generated as follows. First, an adversary chooses two binary strings of length d and a longest common subsequence A of them. Then, every character is perturbed independently with probability p, except that A is perturbed in exactly the same way inside the two strings.
We design two efficient algorithms that compute the edit distance on smoothed instances up to a constant factor approximation. The first algorithm runs in near-linear time, namely d 1 + ε for any fixed ε> 0. The second one runs in time sublinear in d, assuming the edit distance is not too small. These approximation and runtime guarantees are significantly better then the bounds known for worst-case inputs, e.g. near-linear time algorithm achieving approximation roughly d 1/3, due to Batu, Ergün, and Sahinalp [SODA 2006].
Our technical contribution is twofold. First, we rely on finding matches between substrings in the two strings, where two substrings are considered a match if their edit distance is relatively small, a prevailing technique in commonly used heuristics, such as PatternHunter of Ma, Tromp and Li [Bioinformatics, 2002]. Second, we effectively reduce the smoothed edit distance to a simpler variant of (worst-case) edit distance, namely, edit distance on permutations (a.k.a. Ulam’s metric). We are thus able to build on algorithms developed for the Ulam metric, whose much better algorithmic guarantees usually do not carry over to general edit distance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: A basic local alignment search tool. J. of Molecular Biology 215(3), 403–410 (1990)
Andoni, A., Indyk, P., Krauthgamer, R.: Overcoming the ℓ1 non-embeddability barrier: Algorithms for product metrics (manuscript, 2008)
Andoni, A., Krauthgamer, R.: The computational hardness of estimating edit distance. In: 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 724–734. IEEE, Los Alamitos (2007)
Batu, T., Ergün, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: Proceedings of the Symposium on Theory of Computing, pp. 316–324 (2003)
Batu, T., Ergün, F., Sahinalp, C.: Oblivious string embeddings and edit distance approximations. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 792–801 (2006)
Bar-Yossef, Z., Jayram, T.S., Krauthgamer, R., Kumar, R.: Approximating edit distance efficiently. In: Proceedings of the Symposium on Foundations of Computer Science, pp. 550–559 (2004)
Blum, A., Spencer, J.: Coloring random and semi-random k-colorable graphs. J. Algorithms 19(2), 204–234 (1995)
Charikar, M., Krauthgamer, R.: Embedding the ulam metric into ℓ1. Theory of Computing 2(11), 207–224 (2006)
Feige, U., Kilian, J.: Heuristics for semirandom graph problems. J. Comput. Syst. Sci. 63(4), 639–673 (2001)
Frieze, A., McDiarmid, C.: Algorithmic theory of random graphs. Random Structures and Algorithms 10(1-2), 5–42 (1997)
Gollapudi, S., Panigrahy, R.: A dictionary for approximate string search and longest prefix search. In: 15th ACM international conference on Information and knowledge management, pp. 768–775. ACM, New York (2006)
Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001); Special issue on Text and Databases. Invited paper
Spielman, D.A., Teng, S.-H.: Smoothed analysis: Motivation and discrete models. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748. Springer, Heidelberg (2003)
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andoni, A., Krauthgamer, R. (2008). The Smoothed Complexity of Edit Distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds) Automata, Languages and Programming. ICALP 2008. Lecture Notes in Computer Science, vol 5125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70575-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-70575-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70574-1
Online ISBN: 978-3-540-70575-8
eBook Packages: Computer ScienceComputer Science (R0)