MINI Batch 3 First Page

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

TEXT ANALYSIS AND INFORMATION RETRIEVAL OF

HISTORICAL TAMIL ANCIENT DOCUMENTS USING


MACHINE TRANSLATION IN IMAGEZONING
A MINI PROJECT REPORT

Submitted by
822320104006 JAYASRI S
822320104007 JAYASURYA K
822320104009 PAVITHRA G
822320104309 SARANYAM
in partial fulfilment for the award of the degree

of

BACHELOR OF ENGINEERING

in

COMPUTER SCIENCE AND ENGINEERING

VANDAYAR ENGINEERING COLLEGE,THANJAVUR

ANNA UNIVERSITY:CHENNAI 600025


MAY 2023
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE

Certified that this project report “TEXT ANALYSIS AND INFORMATION


RETRIEVAL OF HISTORICAL TAMIL ANCIENT DOCUMENTS
USING MACHINE TRANSLATION IN IMAGE ZONING“is the bonafied
work of “JAYASRI S(822320104006), JAYASURYA K(822320104007),
PAVITHRA G(822320104009), SARANYA M(822320104309)” who carried
out the project work under my supervision.

Mr. M.SARAVANAKUMAR, M.E.,


HEAD OF THE DEPARTMENT
DEPARTMENT OF CSE
Vandayar Engineering College
Thanjavur-613501

Submitted for the project viva voice examination held on ………………...

INTERNAL EXAMINER EXTERNAL EXAMINER

ABSTRACT
Now-a-days digitalization becomes an important one for documents

preservation. Some Tamil Handwritten characters need preservation, like land

documents etc. So we try to overcome the difficulty of paper preservation by

digitalizing it. The aim of this Project is to require Handwritten set of Tamil

Characters as input within the format of image to process the character, train

the Convolution Neural Network algorithm to acknowledge the pattern and

convert the recognized characters to a Printed document. Convolutional Neural

Network then attempts to work out if the computer file matches a pattern that

the Neural Network has memorized. Optical Character Recognition deals with

a crucial concern issue of handwritten character classification. To beat the

difficulty of knowledge recognition among similarities, Convolutional Neural

Network will provide more accuracy of character recognition. Convolutional

Neural Networks (CNN) are playing an important role nowadays in every

aspect of computer vision applications. The art of CNN is used in recognizing

Tamil handwritten characters in offline mode. CNNs differ from traditional

approach of Tamil Handwritten Character Recognition (THCR) in extracting

the features by the methods of pre-processing, normalization, feature extraction

and classification. we’ve developed a CNN model from scratch by training the

model with the Tamil characters in offline mode. This work is for digitalizing

offline THCR using deep learningtechnique.


ACKNOWLEDGEMENT

First of all, I thank God for his blessing and grace on every success in the
life and also this project.

The success of any work lies in the involvement and commitment of its
makers, this being no exception. At this juncture I would like to acknowledge
the many minds that made this project of ours to be a reality.

My grateful thank to our beloved Chairman


Mr.S.GUNASEKARAVANDAYAR, M.Com and Correspondent
Mr.G.VIJAY PRAKASH, B.E., M.Tech., M.B.A for the continuous help
during our course period by arranging various activities.

Next my grateful thanks to our Principal Dr.SAMUNDEESWARI,


M.Tech., Ph.D for providing his hands to us to successfully complete the
course.

The encouragement and support of our Head of the Department


Mr.M.SARAVANAKUMAR, M.E., as always guiding with spirit throughout
the course period. Whenever clouds of disappointment hung above, I am always
had one never failing ray of hope that showed the path, time and again, in the
form of our Head of the Department.
My special thanks for the invaluable help and guidance given by my
internal guide Mr.M.SARAVANAKUMAR, M.E.,Assistant Professor
Department of Computer Science and Engineering, Vandayar Engineering
College, Thanjavur.
I would be failing in my duty if I do not mention the wholehearted
support and technical assistance extended to me by all the FACULTY
MEMBERS AND TECHNICAL STAFFS of the Computer Science and
Engineering Department, Vandayar Engineering College, Thanjavur.
TABLE OF CONTENTS

S.NO TITLE PAGE NO


ABSTRACT iii
LIST OF FIGURES iv
LIST OF ABBREVIATION v
1 INTRODUCTION 1
1.1 INTRODUCTION – TAMIL LANGUAGE 1
1.2 BRAHMI AND VATTEZHUTHU SCRIPTS 2
2 LITERATURE REVIEW 4
2.1 CREATION OF ORIGINAL TAMIL
CHARACTER DATASET
4
THROUGHSEGREGATION OF ANCIENT
PALM LEAF MANUSCRIPTS IN MEDICINE
2.2 THE DATASET FOR PRINTED BRAHMI
4
WORD RECOGNITION
2.3 IMPROVING THE QUALITY AND
READABILITY OF ANCIENT BRAHMI STONE 5
INSCRIPTIONS
3 SYSTEM ANALYSIS 6
3.1 EXISTING SYSTEM 6
3.2 PROPOSED SYSTEM 6
3.3 SYSTEM REQUIREMENTS 7
4 PROJECT DESCRIPTION 8
4.1 RECOGNITION OF HANDWRITTEN 8
4.2SYSTEM ARCHITECTURE 9
4.2.1 STRUCTURE OF THE SYSTEM 9

4.2.1 IMAGE CAPTURING 9


4.2.3 IMAGE PREPROCESSING 10
4.2.4 DATA SET TRAINING 10

4.2.5CHARACTER RECOGNITION 12

4.2.6 UNICODE TEXT 12

4.2.7 RETRIEVE FROM THE DATABASE 12

4.3 ANCIENT BRAHMI CHARACTER


13
RECOGNITION SYSTEM
5 PROPOSED ALGORITHM 14
5.1 ZONING 14
5.2TESTING AND TRAINING 15
5.2.1 CHARACTER RECOGNITION 15
6 SYSTEM DESIGN 18
6.1 INTRODUCTION 18

6.2MODERN TAMIL SCRIPT 18


6.3 ANCIENT TAMIL SCRIPT 20
7 IMPLEMENTATION AND RESULTS 21
7.1PREPROCESSING OF IMAGE 21
7.2 FEATURE EXTRACTION 22
7.3 RESULTS 23
7.3.1 SCREEN SHOTS 23
8 CONCLUSION 24
APPENDIX – I 25
APPENDIX – II

LIST OF FIGURES
FIG.NO FIGURES PAGENO

Fig. 4.1 Architecture of ancient Tamil Brahmi

character recognition system for stone inscription 9

Fig.4.2 A Vattezhuthu script in temple epigraphy 10

Fig.4.3 Using Sobel’s edge detection method for segmentation 11

Fig.4.4 Linesegmentation 11

Fig.4.5 Character segmentation 11

Fig.4.6 Vattezhuthu character set 13

Fig.5.2 Unknown charactersinthedataset 16

Fig5.3 Maacharacterfromdifferent writers 16

You might also like