Abstract
The text in stylistic documents may have different orientations; the text lines may be curved in shape and they also may not be parallel to each other within a page. As a result, extraction and subsequent recognition of individual text lines and words in such documents is a difficult task. Thinning is one of the most crucial phases in the process of text recognition of characters to a single pixel notation and its success lies in its property to retain the original character shape. Thinning algorithms pose problems due to presence of distinct non-isolated boundaries and complex character shapes in different scripts and produce unwanted edges. This paper presents an improved thinning algorithm which does not produce unwanted edges to get the path of the text for the development of curved straightening system of Optical Character Recognition (OCR). When experimented on documents with either English or Hindi curved text, visual inspection of the results show that proposed method yields promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Marinai, S.: Introduction to document analysis and recognition. SCI, vol. 90, pp. 1–20 (2008)
Tang, C., Suen, Y., Yan, C.D., Cheriet, M.: Document analysis and understanding: a brief survey. In: Proceeding of 1st Int. Conf. on Document Analysis and Recognition, Saint-Malo, France, pp. 17–31 (October 1991)
Plamondon, R., Srihari, S.N.: On-line and off-line handwritten recognition: a comprehensive survey. IEEE Trans. on PAMI 22, 62–84 (2000)
Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25, 10–22 (1992)
Pal, U., Tripathy, N.: Multi-oriented and curved text lines extraction from Indian documents. IEEE Trans. on Systems, Man, and Cybernetics—Part B: Cybernetics 34(4), 1676–1684 (2004)
Roy, P.P., Pal, U., Lladós, J., Kimura, F.: Convex hull based approach for multioriented character recognition from graphical documents. In: Proceeding of ICPR, pp. 1–4. IEEE (2008)
Goto, H., Aso, H.: Extracting curved lines using local linearity of the text line. Int. J. Doc. Anal. Recognit. 2, 111–118 (1999)
Gonzalez, R.C., Woods, R.E.: Digital image processing (DIP/3e), 3rd edn. Pearson Education, Asia
Arcelli, C.: A condition for digital points removal. Signal Processing 1(4), 283–285 (1974)
Arcelli, C., Sanniti di Baja, G.: Medial lines and figure analysis. In: Proceeding of 5th Int. Conf. on Pattern Recognition, pp. 1016–1018 (1980)
Lam, L., Lee, S.W., Suen, S.Y.: Thinning methodologies-a comprehensive survey. IEEE Trans. PAMI, 869–885 (1992)
Arcelli, C.: Pattern thinning by contour tracing. Comput. Vision Graphics Image Process. 17, 130–144 (1981)
Latecki, L., Ma, C.M.: An algorithm for a 3D simplicity test. Computer Vision and Image Understanding 63, 388–393 (1996)
Eckhardt, U., Maderlechner, G.: Thinning of binary images. Hamb. Beitr. Angew. Math. BÂ 11 (April 1989)
Heijmans, H.J.A.M., Ronse, C.: The algebraic basis of mathematical morphology. Part I. Dilations and Erosions, Comput. Vision Graphics Image Process 50, 245–295 (1990)
Kong, T.Y., Rosenfeld, A.: Digital topology: Introduction and survey. Comput. Vision Graphics Image Process. 48, 357–393 (1989)
Naccache, N.J., Shinghal, R.: SPTA: A proposed algorithm for thinning binary patterns. IEEE Trans. Systems Man Cybernet SMC 14, 409–418 (1984)
Tanura, H.: A comparison of line thinning algorithm from a digital geometry viewpoint. In: Proceeding of 6th Int. Conf. of Pattern Recognition, pp. 715–719 (1978)
Arcelli, C., Sanniti di Baja, G.: Text recognition. Signal Processing 41, 49–76 (1995)
Huang, L., Wan, G., Liu, C.: An improved parallel thinning algorithm. In: Proceedings of the 7th Int. Conf. on Doc. Ana. and Rec., vol. 2, pp. 780–786 (2003)
Cowell, J., Fiaz, H.: Thinning Arabic characters for feature extraction. In: IEEE Proceedings of 5th Int. Conf. on Information Visualization, pp. 181–187 (2001)
Shaikh, N.A., Shaikh, Z.A.: Delimiting factors in the automation of Sindhi language. Internal Technical report submitted to National University of Computer and Emerging Sciences, Karachi (March 2004)
Kavianafar, M., Amin, A.: Pre-processing and structural feature extraction for multi fonts Arabic/ Persian OCR. In: Proceedings of 5th Int. Conf. on Doc. Ana. and Rec., pp. 213–220 (1999)
Shaikh, N.A., Shaikh, Z.A.: A comparative analysis on the applications of various thinning algorithms on Arabic scripting languages. Technical report submitted to National University of Computer and Emerging Sciences, Karachi (December 2004)
Kanungo, T., Haralick, R.M.: Character recognition using mathematical morphology. In: Proceedings of USPS 4th Advanced Technology Conference, Washington, D.C., pp. 973–986 (1990)
Chaudhuri, B.B., Majumdar, A.: Curvelet–based multi SVM recognizer for offline handwritten Bangla: A major Indian script. In: Proceeding of Int. Conf. on Doc. Ana. and Rec. ICDAR, pp. 491–495 (2007)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Systems Man Cybernet. 9(1), 62–66 (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer India Pvt. Ltd.
About this paper
Cite this paper
Singh, B., Goswami, S., Goyal, P., Mittal, A. (2012). A Robust Thinning Algorithm for Straightening of Curved Text Line. In: Deep, K., Nagar, A., Pant, M., Bansal, J. (eds) Proceedings of the International Conference on Soft Computing for Problem Solving (SocProS 2011) December 20-22, 2011. Advances in Intelligent and Soft Computing, vol 131. Springer, New Delhi. https://doi.org/10.1007/978-81-322-0491-6_83
Download citation
DOI: https://doi.org/10.1007/978-81-322-0491-6_83
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-0490-9
Online ISBN: 978-81-322-0491-6
eBook Packages: EngineeringEngineering (R0)