The Smoothed Complexity of Edit Distance

Andoni, Alexandr; Krauthgamer, Robert

doi:10.1007/978-3-540-70575-8_30

Alexandr Andoni⁷ &
Robert Krauthgamer⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5125))

Included in the following conference series:

International Colloquium on Automata, Languages, and Programming

2218 Accesses

Abstract

We initiate the study of the smoothed complexity of sequence alignment, by proposing a semi-random model of edit distance between two input strings, generated as follows. First, an adversary chooses two binary strings of length d and a longest common subsequence A of them. Then, every character is perturbed independently with probability p, except that A is perturbed in exactly the same way inside the two strings.

We design two efficient algorithms that compute the edit distance on smoothed instances up to a constant factor approximation. The first algorithm runs in near-linear time, namely d ^1 + ε for any fixed ε> 0. The second one runs in time sublinear in d, assuming the edit distance is not too small. These approximation and runtime guarantees are significantly better then the bounds known for worst-case inputs, e.g. near-linear time algorithm achieving approximation roughly d ^1/3, due to Batu, Ergün, and Sahinalp [SODA 2006].

Our technical contribution is twofold. First, we rely on finding matches between substrings in the two strings, where two substrings are considered a match if their edit distance is relatively small, a prevailing technique in commonly used heuristics, such as PatternHunter of Ma, Tromp and Li [Bioinformatics, 2002]. Second, we effectively reduce the smoothed edit distance to a simpler variant of (worst-case) edit distance, namely, edit distance on permutations (a.k.a. Ulam’s metric). We are thus able to build on algorithms developed for the Ulam metric, whose much better algorithmic guarantees usually do not carry over to general edit distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 117.69; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis

Approximating the Geometric Edit Distance

Article 25 April 2022

Longest Common Prefixes with k-Errors and Applications

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: A basic local alignment search tool. J. of Molecular Biology 215(3), 403–410 (1990)
Article Google Scholar
Andoni, A., Indyk, P., Krauthgamer, R.: Overcoming the ℓ₁ non-embeddability barrier: Algorithms for product metrics (manuscript, 2008)
Google Scholar
Andoni, A., Krauthgamer, R.: The computational hardness of estimating edit distance. In: 48th Annual IEEE Symposium on Foundations of Computer Science, pp. 724–734. IEEE, Los Alamitos (2007)
Google Scholar
Batu, T., Ergün, F., Kilian, J., Magen, A., Raskhodnikova, S., Rubinfeld, R., Sami, R.: A sublinear algorithm for weakly approximating edit distance. In: Proceedings of the Symposium on Theory of Computing, pp. 316–324 (2003)
Google Scholar
Batu, T., Ergün, F., Sahinalp, C.: Oblivious string embeddings and edit distance approximations. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 792–801 (2006)
Google Scholar
Bar-Yossef, Z., Jayram, T.S., Krauthgamer, R., Kumar, R.: Approximating edit distance efficiently. In: Proceedings of the Symposium on Foundations of Computer Science, pp. 550–559 (2004)
Google Scholar
Blum, A., Spencer, J.: Coloring random and semi-random k-colorable graphs. J. Algorithms 19(2), 204–234 (1995)
Article MathSciNet MATH Google Scholar
Charikar, M., Krauthgamer, R.: Embedding the ulam metric into ℓ₁. Theory of Computing 2(11), 207–224 (2006)
Article MathSciNet MATH Google Scholar
Feige, U., Kilian, J.: Heuristics for semirandom graph problems. J. Comput. Syst. Sci. 63(4), 639–673 (2001)
Article MathSciNet MATH Google Scholar
Frieze, A., McDiarmid, C.: Algorithmic theory of random graphs. Random Structures and Algorithms 10(1-2), 5–42 (1997)
Article MathSciNet MATH Google Scholar
Gollapudi, S., Panigrahy, R.: A dictionary for approximate string search and longest prefix search. In: 15th ACM international conference on Information and knowledge management, pp. 768–775. ACM, New York (2006)
Google Scholar
Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Article MathSciNet MATH Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Article Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)
Article Google Scholar
Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Engineering Bulletin 24(4), 19–27 (2001); Special issue on Text and Databases. Invited paper
Google Scholar
Spielman, D.A., Teng, S.-H.: Smoothed analysis: Motivation and discrete models. In: Dehne, F., Sack, J.-R., Smid, M. (eds.) WADS 2003. LNCS, vol. 2748. Springer, Heidelberg (2003)
Chapter Google Scholar
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004)
Article MathSciNet MATH Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

MIT, USA
Alexandr Andoni
Weizmann Institute and IBM Almaden, Israel
Robert Krauthgamer

Authors

Alexandr Andoni
View author publications
You can also search for this author in PubMed Google Scholar
Robert Krauthgamer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavík, Iceland
Luca Aceto
Department of Computer Science, IT-Parken, University of Aarhus, Åbogade 34, 8200, ÅrhusN, Denmark
Ivan Damgård
Department of Computer Science, University of Liverpool, Ashton Building, L69 3BX, Liverpool, UK
Leslie Ann Goldberg
School of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavik, Iceland
Magnús M. Halldórsson
Department of Computer Science, Reykjavik University, Kringlan 1, 103, Reykjavík, Iceland
Anna Ingólfsdóttir
Université de Bordeaux-1, LaBRI, 351, Cours de la Libération, 33405, Talence cedex, France
Igor Walukiewicz

Copyright information

About this paper

Cite this paper

Andoni, A., Krauthgamer, R. (2008). The Smoothed Complexity of Edit Distance. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds) Automata, Languages and Programming. ICALP 2008. Lecture Notes in Computer Science, vol 5125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70575-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-70575-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70574-1
Online ISBN: 978-3-540-70575-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Smoothed Complexity of Edit Distance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis

Approximating the Geometric Edit Distance

Longest Common Prefixes with k-Errors and Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Smoothed Complexity of Edit Distance

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Algorithmic Framework for Approximate Matching Under Bounded Edits with Applications to Sequence Analysis

Approximating the Geometric Edit Distance

Longest Common Prefixes with k-Errors and Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation