0% found this document useful (0 votes)
104 views72 pages

Ranjith S - Mini Project

The document is a mini project report on speech and text recognition. It discusses converting speech to text and text to speech. The existing system is speaker dependent systems that require training individual voices. The objectives are to develop a system that can convert speech to text and text to speech without training for each user. The project uses Java, XML, and Android Studio. It discusses the hardware and software requirements.

Uploaded by

monu mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views72 pages

Ranjith S - Mini Project

The document is a mini project report on speech and text recognition. It discusses converting speech to text and text to speech. The existing system is speaker dependent systems that require training individual voices. The objectives are to develop a system that can convert speech to text and text to speech without training for each user. The project uses Java, XML, and Android Studio. It discusses the hardware and software requirements.

Uploaded by

monu mary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 72

MINI PROJECT REPORT

(PCA20P02L)

ON

SPEECH AND TEXT RECOGNITION

(CONVERSION OF SPEECH INTO TEXT)

By

RANJITH S

(RA2132241020088)

Submitted to the

DEPARTMENT OF COMPUTER SCIENCE AND APPLICATIONS (MCA)

Under the guidance of

Dr. S. UMA SHANKARI MCA, M.Phil., Ph.D., NET, SET

Assistant Professor, Department of Computer Applications

MASTER OF COMPUTER APPLICATIONS

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

Ramapuram, Chennai.
NOVEMBER 2022
COLLEGE OF SCIENCE & HUMANITIES

Ramapuram, Chennai.

Department of Computer Science and Applications (MCA)

BONAFIDE CERTIFICATE

Certified that this project report titled “SPEECH AND TEXT RECOGNITION”
is the bonafide work of RANJITH S (Reg No:RA2132241020088) who carried
out the Mini project work done under my supervision.

Signature of Internal Guide Signature of Head of the Department

Signature of External Examiner


TABLE OF CONTENTS

S.NO TITLE PAGE


NO
INTRODUCTION

1 1.1PROJECT INTRODUCTION 2

WORKING ENVIRONMENT

2.1 EXISTING SYSTEM 5

2
2.2 OBJECTIVES: 6
2.3 PROJECT CATEGORY: 7
2.4TOOLS/ENVIRONMENT:

Software Requirement Specification (SRS): 9

1)Introduction: 10

11

3 11
3.1.1 Purpose:
12
3.1.2. Scope:
13
3.1.3 Benefits:

3.1.4 Abbreviations:

3.1.5 References:
SYSTEM DESIGN

1
4 4.1 GIANT CHART:
1

1
4.2 USE CASE DIAGRAM

4.3 Data Structure

PROJECT DESCRIPTION
22
5 5.1 2

Produc

5.2 MODULE DESCRIPTION

5.3 IMPLEMENTATION
SYSTEM TESTING
26
6.1 TESTING DEFINITON
2
6
6.2 TESTING OBJECTIVE

6.3 TYPES OF TESTING


CONCLUSION

7.1 SUMMARY 3
7
7.2 FUTURE ENHANCEMENTS

APPENDIX

8.1 SCREENSHOTS
8
8.2 CODING

8.3 DATA DICTIONARY

9 BIBLIOGRAPHY AND REFERENCES


LIST OF FIGURES

FIG.NO TITLE PAGE.NO

1.1.1 SPEECH AND TEXT CONVERSION 2

1.1.2 TEXT TO SPEECH CONVERSION 3

3.4 PROPOSED SYSTEM 12

4.1.1 TEXT TO SPEECH DATAFLOW DIAGRAM 16


SPEECH TO TEXT DATAFLOW DIAGRAM
4.1.2 16
TEXT TO SPEECH USECASE DIAGRAM
4.2.1 17
SPEECH TO TEXT USECASE DIAGRAM
4.2.2 18
ARCHITECTURE DIAGRAM 1 ARCHITECTURE
4.3.1 19
DIAGRAM 2
4.3.2 HOMEPAGE 20

8.1.1 TEXT TO SPEECH CONVERSION PAGE SPEECH 34

8.1.2 TO TEXT CONVERSION PAGE 35

TRANSLATOR PAGE
8.1.3 36

8.1.4 37
CHAPTER 1
INTRODUCTION

1.1 PROJECT INTRODUCTION:


Speech and text recognition has three features text to speech, speech to text and translator Which
helps for a deaf and dumb people to read or hear the content which is typed or which is spoken in
it " Text to speech " This feature helps people who are dumb to express the words through
mobile phone speakers. "Speech to text " This feature allows the deaf people to understand the
feelings and words of others through their smart phone screen. " Translator " feature renders the
message and feelings of a person from one language to another this makes our life easier and
more comfortable. Statistics show that there are currently over 325, 000 health-related mobile
apps now presented on app marketplaces. As per the statistics healthcare app developers, are
keen on developing projects like fitness app, calorie burning tracker, online pharmacy apps,
online doctor consulting apps are the areas shown interest in developing. There are only very few
developers who develop these health app for benefit of the person in need, so these apps are in
huge demand in the market. Speech recognition is an interdisciplinary subfield of computer
science and computational linguistics that develops methodologies and technologies that enable
the recognition and translation of spoken language into text by computers with the main benefit
of searchability. It is also known as automatic speech recognition (ASR), computer speech
recognition or speech to text (STT). It incorporates knowledge and research in the computer
science, linguistics and computer engineering fields. The reverse process is speech synthesis.

1
1.1.1 SPEECH AND TEXT CONVERSION
Some speech recognition systems require "training" (also called "enrollment") where an
individual speaker reads text or isolated vocabulary into the system. The system analyzes the
person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting
in increased accuracy. Systems that do not use training are called "speaker-independent"[1]
systems. Systems that use training are called "speaker dependent”. Speech recognition
applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing
(e.g. "I would like to make a collect call"), demotic appliance control, search key words (e.g.
find a podcast where particular words were spoken), simple data entry (e.g., entering a credit
card number), preparation of structured documents (e.g. a radiology report), determining
speaker characteristics, speech-to-text processing (e.g., word processors or emails), and aircraft
(usually termed direct voice input).

1.1.2 TEXT TO SPEECH CONVERSION

2
CHAPTER 2

3
WORKING ENVIRONMENT

2.1 HARDWARE REQUIREMENTS


• Operating system: windows 10

• Processor: Intel Core i5 (8th Gen) Processor

• Hard Drive: 1TB.

• Memory (RAM): 8 GB.

2.2 SOFTWARE REQUIREMENTS

• Java 19

• Frontend XML

• Android Studio

2.3 SYSTEM SOFTWARE

REQUIREMENT ANALYSIS

Requirements are a feature of a system or description of something that the system is capable of
doing in order to fulfil the system’s purpose. It provides the appropriate mechanism for
understanding what the customer wants, analyzing the needs assessing feasibility, negotiating a
reasonable solution, specifying the solution unambiguously, validating the specification and
managing the requirements as they are translated into an operational system.

JAVA

4
Java is a class-based, object-oriented programming language that is designed to have as few
implementation dependencies as possible. It is a general-purpose programming language intended
to let application developers write once, run anywhere (WORA), meaning that compiled Java code
can run on all platforms that support Java without the need for recompilation. Java applications are
typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of the
underlying computer architecture. The syntax of Java is similar to C and C++, but has fewer
lowlevel facilities than either of them. The Java runtime provides dynamic capabilities (such as
reflection and runtime code modification) that are typically not available in traditional compiled
languages. As of 2019, Java was one of the most popular programming languages in use according
to particularly for client-server web applications, with a reported 9 million developers. Java was
originally developed by James Gosling at Sun Microsystems (which has since been acquired by
Oracle) and released in 1995 as a core component of Sun Microsystems' Java platform. The
original and reference implementation Java compilers, virtual machines, and class libraries were
originally released by Sun under proprietary licenses. As of May 2007, in compliance with the
specifications of the Java Community Process, Sun had relicensed most of its Java technologies
under the GNU General Public License. Oracle offers its own Hotspot Java Virtual Machine;
however, the official reference implementation is the OpenJDK JVM which is free open-source
software and used by most developers and is the default JVM for almost all Linux distributions.
Features in Java

One of the biggest reasons why Java is so popular is the platform independence. Programs can run
on several different types of computers; as long as the computer has a Java Runtime Environment
(JRE) installed, a Java program can run on it. Most types of computers will be compatible with a
JRE including PCs running on Windows, Macintosh computers, Unix or Linux computers, and
large mainframe computers, as well as mobile phones. Since it has been around for so long, some
of the biggest organizations in the world are built using the language. Many banks, retailers,
insurance companies, utilities, and manufacturers all use Java.

ANDROID STUDIO

5
Android Studio is the official integrated development environment (IDE) for Google's Android
operating system, built on JetBrains' IntelliJ IDEA software and designed specifically for
Android development. It is available for download on Windows, macOS and Linux based
operating systems or as a subscription-based service in 2020. It is a replacement for the Eclipse
Android Development Tools (E-ADT) as the primary IDE for native Android application
development. Android Studio was announced on May 16, 2013 at the Google I/O conference. It
was in early access preview stage starting from version 0.1 in May 2013, then entered beta
stage starting from version 0.8 which was released in June 2014. The first stable build was
released in December 2014, starting from version 1.0.

Features in Android Studio


Android Studio supports all the same programming languages of IntelliJ (and CLIN)
e.g., Java, C++, and more with extensions, such as Go and Android Studio 3.0 or later supports
Kotlin and "all Java 7 language features and a subset of Java 8 language features that vary by
platform version." External projects backport some Java 9 features. While IntelliJ states that
Android Studio supports all released Java versions, and Java 12, it's not clear to what level
Android Studio supports Java versions up to Java 12 (the documentation mentions partial Java
8 support). At least some new language features up to Java 12 are usable in Android. Once an
app has been compiled with Android Studio, it can be published on the Google Play Store. The
application has to be in line with the Google Play Store developer content policy

6
CHAPTER 3

7
SYSTEM ANALYSIS
3.1 FEASIBILITY STUDY

The feasibility of the project is analyzed in this phase and business proposal is put forth N with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is not
a burden to the person. For feasibility analysis, some understanding of the major requirements for
the system is essential. Three key considerations involved in the feasibility analysis are

• Economic Feasibility
• Technical Feasibility
• Social Feasibility

3.1.1. Economic Feasibility


This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development of
the system is limited. The expenditures must be justified. Thus, the developed system as well
within the budget and this was achieved because most of the technologies used are freely available.
Only the customized products had to be purchased.

3.1.2. Technical Feasibility


This study is carried out to check the technical feasibility, that is, the technical requirements of the
system. Any system developed must not have a high demand on the available technical resources.
This will lead to high demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest requirement, as only
minimal or null changes are required for implementing this system.
3.1.3. Social Feasibility
The aspect of study is to check the level of acceptance of the system by the user. This includes the
process of training the user to use the system efficiently. The user must not feel threatened by the
system, instead must accept it as a necessity. The level of acceptance by the users solely depends
8
on the methods that are employed to educate the user about the system and to make him familiar
with it. His level of confidence must be raised so that he is also able to make some constructive
criticism, which is welcomed, as he is the final user of the system.

3.2 EXISTING SYSTEM

3.3 DRAWBACKS OF EXISTING SYSTEM

“Write by voice, Speech to text, tell me, Speechify”


These are some of the good rated apps currently available in the app market.

• These applications actually don’t help this community in any way, these apps are just
named in the prefix of deaf and dumb but these things actually just increase the volume,
this is not going to help them in anyway.

9
• Most of the existing system have the title as deaf and dumb assistance but most of the
existing systems do not serve the relevant services that has to be served in this title, but
our system provides you the perfect services which has to be served with this Title.

3.4 PROPOSED SYSTEM


All the existing systems till now supports only a single assistance, whereas our system serves
you both the assistance Within a single application. Most of the existing system have the title as
deaf and dumb assistance but most of the existing systems do not serve the relevant services that
has to be served in this title, but our system provides you the perfect services which has to be
served with this Title. As our System provides you with the translator which will be more helpful
with this application which enhances your experience with this application

Phase-1: Train the Convolutional Neural Network (CNN) with the train dataset.

Phase-2: Store the obtained weights and parameters in file system. Phase-

3: Pre-process the input image.

Phase-4: Loading stored weights in to the CNN.

Phase-5: Classification of the image.

10
3.4 PROPOSED SYSTEM

3.5 BENEFITS OF PROPOSED SYSTEM

We achieved final accuracy of 95.0% on our data set. We have improved our prediction after
implementing two layers of algorithms wherein we have verified and predicted symbols
which are more similar to each other.

This gives us the ability to detect almost all the symbols provided that they are shown properly,
there is no noise in the background and lighting is adequate.

3.6 SCOPE OF THE PROJECT:


We are planning to achieve higher accuracy even in case of complex backgrounds by trying out
various background subtraction algorithms. This system provides three features text to speech,
speech to text and translator This system helps deaf people to read the message what they want
to hear will be displayed through mobile phone screen. This system helps dumb people to speak
the message what they want to speak through mobile phone speakers. This system helps both
deaf and dumb people to translate the message to other languages There are many more works
that can be carried out as an extension of this project. This system predicts the need of the mute

11
person but future systems may be developed that could communicate to the mute person’s
mobile device, allowing the system to learn the needs of the user, thereby provisioning the
development of recommendatory systems as they have the relevant data related to the mute
person that can easily be learned thought he neural network model.

12
CHAPTER 4

13
SYSTEM DESIGN

● Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified
requirements. It is the process of defining, developing and designing systems which
satisfies the specific needs and requirements of a business or organization.
● A systemic approach is required for a coherent and well-running system. Bottom-Up or
Top-Down approach is required to take into account all related variables of the system. A
designer uses the modelling languages to express the information and knowledge in a
structure of system that is defined by a consistent set of rules and definitions. The designs
can be defined in graphical or textual modelling languages.

DESIGN METHODS:

● Architectural design: To describes the views, models, behavior, and structure of the
system.
● Logical design: To represent the data flow, inputs and outputs of the system. Example:
ER Diagrams (Entity Relationship Diagrams).
● Physical design: Defined as a) How users add information to the system and how the
system represents information back to the user. b) How the data is modelled and stored
within the system. c) How data moves through the system, how data is validated, secured
and/or transformed as it flows through and out of the system.
4.1 DATA FLOW DIAGRAM

DFDs make it easy to depict the business requirements of applications by representing the
sequence of process steps and flow of information using a graphical representation or visual
representation rather than a textual description. When used through an entire development
process, they first document the results of business analysis. Then, they refine the
representation to show how information moves through, and is changed by, application
flows.
Both automated and manual processes are represented.

14
DATAFLOW DIAGRAM

4.1.1 TEXT TO SPEECH

4.1.2 SPEECH TO TEXT


4.2 USE CASE DIAGRAM

Use-case diagrams describe the high-level functions and scope of a system. These diagrams also
identify the interactions between the system and its actors. The use cases and actors in use-case
diagrams describe what the system does and how the actors use it, but not how the system
operates internally. Use-case diagrams illustrate and define the context and requirements of
either an entire system or the important parts of the system. You can model a complex system
with a single use- case diagram, or create many use-case diagrams to model the components of
the system. You would typically develop use-case diagrams in the early phases of a project and
refer to them throughout the development process.

15
USECASE DIAGRAM

4.2.1 TEXT TO SPEECH

16
4.2.2 SPEECH TO TEXT

4.3 ARCHITECTURE DIAGRAM

An architecture diagram is a visual representation of all the elements that make up part, or all, of
a system. Above all, it helps the engineers, designers, stakeholders and anyone else involved in
the project understand a system or app’s layout. This diagram gives a top-level view of a
software’s structure. To elaborate, it generally includes various components that interact with
each other and how the software interacts with external databases and servers. It’s useful for

17
explaining software to clients and stakeholders; and assessing the impact of adding new features
or upgrading, replacing, or merging existing applications.

4.3.1 ARCHITECTURE DIAGRAM 1

18
4.3.2 ARCHITECTURE DIAGRAM 2

19
CHAPTER 5

20
CHAPTER 5

PROJECT DESCRIPTION:

5.1 OBJECTIVE:

● To Statistics show that there are currently over 325, 000 health-related mobile apps now
presented on app marketplaces.
● As per the statistics healthcare app developers, are keen on developing projects like
fitness app, calorie burning tracker, online pharmacy apps, online doctor consulting apps
are the areas shown interest in developing.
● There are only very few developers who develop these health app for benefit of the
person in need so these apps are in huge demand in the market.

5.2 MODULE DESCRIPTION:


This application has three modules through which all the operations takes place.

● Module 1 – Text-to-Speech

● Module 2 – Speech-to-Text

● Module 3 – Translator

MODULE 1 – TEXT-TO SPEECH:


• The Text-To-Speech is the first module, this module will direct you to the next page where a
text- box will be given. The text that has to be pronounced as output will be entered here.
• The entered text or the text from the attached file, will be read by the text recognizer and the
words from the text will be matched and stored.
• The speech synthesizer will detect the words from the text recognizer, the data collected will
be manipulated and arranged according to the grammatical terms.

21
• The synthesizer will transform the collected and arranged data into waveform, Now the wave
form will be given as output through the phone speakers.

MODULE 2 – SPEECH-TO-TEXT:
• Speech-To-Text is the second module, this module will direct you to the next page where a
listen button will be given. You can hold the button and speak the data that has to be
displayed as output.
• The audio recorded will be analyzed, the recorded audio will be braked down into lines. The
start and end of the audio will be analyzed and found.
• From the recorded audio, the noise will be removed and matched with correct corresponding
words.
• This converted waveform will be converted as text and displayed through our mobile screen.

MODULE 3 – TRANSLATOR:

• Translator is the third module; this module will help us to convert the output form the other
modules.

• This module uses the google translate API, through which the output from the other modules
can be translated from one language to other.

5.3 IMPLEMENTATION
Project implementation is the process of putting a project plan into action to produce the
deliverables, otherwise known as the products or services, for clients or stakeholders. It takes
place after the planning phase, during which a team determines the key objectives for the project,
as well as the timeline and budget. Implementation involves coordinating resources and

22
measuring performance to ensure the project remains within its expected scope and budget. It
also involves handling any unforeseen issues in a way that keeps a project running smoothly.
To implement a project effectively, project managers must consistently communicate with a team
to set and adjust priorities as needed while maintaining transparency about the project's status
with the clients or any key stakeholders. Implementation is the stage in the project where the
theoretical design is turned into a working system and is giving confidence on the new system
for the users that it will work efficiently and effectively. It involves careful planning,
investigation of the current system and its constraints on implementation, design of methods to
achieve the changeover, an evaluation of change over methods. Apart from planning major task
of preparing the implementation are education and training of users. The implementation process
begins with preparing a plan for the implementation of the system. According to this plan, the
activities are to be carried out, discussions made regarding the equipment and resources and the
additional equipment has to be acquired to implement the new system. In network backup system
no, additional resources are needed. Implementation is the final and the most important phase.
The most critical stage in achieving a successful new system is giving the users confidence that
the new system will work and be effective. The system can be implemented only after thorough
testing is done and if it is found to be working according to the specification.

23
CHAPTER 6

24
SYSTEM TESTING:

6.1 TESTING DEFINITON:

● Testing is a process of executing a program with the intent of finding an error. A good
test case is one that has a high probability of finding an as-yet –undiscovered error. A
successful test is one that uncovers an as-yet- undiscovered error. System testing is the
stage of implementation, which is aimed at ensuring that the system works accurately
and efficiently as expected before live operation commences. It verifies that the whole
set of programs hang together. System testing requires a test consists of several key
activities and steps for run program, string, system and is important in adopting a
successful new system. This is the last chance to detect and correct errors before the
system is installed for user acceptance testing.
● The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise, the program or the project is not said to be complete.
Software testing is the critical element of software quality assurance and represents the
ultimate the review of specification design and coding. Testing is the process of
executing the program with the intent of finding the error. A good test case design is one
that as a probability of finding a yet undiscovered error. A successful test is one that
uncovers a yet undiscovered error.
● The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub – assemblies or finished product it is the process of
exercising software with the intent of ensuring that the software and does not fail in an
unacceptable manner. System testing is the stage of implementation, which is aimed at
ensuring that the system works accurately and efficiently before live operation
commences. Testing is the process of executing the program with the intent of finding
errors and missing operations and also a complete verification to determine whether the
objectives are met and the user requirements are satisfied. The ultimate aim is Quality
assurance.
25
6.2 TESTING OBJECTIVE:

● We To find errors in the developed software. To check the working of the function is
according to the specification. Their behavior and performance required are fulfilled. To
check the reliability and quality of the software.
● We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.
● The prediction layer estimates how likely the image will fall under one of the classes. So,
the output is normalized between 0 and 1 and such that the sum of each value in each
class sums to 1. We have achieved this using SoftMax function.
● At first the output of the prediction layer will be somewhat far from the actual value. To
make it better we have trained the networks using labelled data. The cross-entropy is a
performance measurement used in the classification. It is a continuous function which is
positive at values which is not same as labelled value and is zero exactly when it is equal
to the labelled value. Therefore, we optimized the cross-entropy by minimizing it as
close to zero. To do this in our network layer we adjust the weights of our neural
networks. TensorFlow has an inbuilt function to calculate the cross entropy.
● As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer

6.3 TYPES OF TESTING:

● Unit testing
● Integration testing
● Functional testing
● System testing
● White box testing
● Black box testing

UNIT TESTING:
● Unit testing is conducted to verify the functional performance of each modular
component of the software. Unit testing focuses on the smallest unit of the software
26
design (i.e.), the module. The white-box testing techniques were heavily employed for
unit testing.

● Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.

INTEGRATION TESTING:
● Integration testing is a systematic technique for construction the program structure while
at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the
product. The objective is to take untested modules and build a program structure tester
should identify critical modules. Critical modules should be tested as early as possible.
One approach is to wait until all the units have passed testing, and then combine them
and then tested. This approach is evolved from unstructured testing of small programs.
Another strategy is to construct the product in increments of tested units. A small set of
modules are integrated together and tested, to which another module is added and tested
in combination. And so on. The advantages of this approach are that, interface dispenses
can be easily found and corrected.

FUNCTIONAL TESTS:
● Functional test cases involved exercising the code with nominal input values for which
the expected results are known, as well as boundary values and special values, such as
logically related inputs, files of identical elements, and empty files.
● Three types of tests in Functional test:
i. Performance Test
ii. Stress Test iii. Structure Test

SYSTEM TEST:

27
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system test is the
configuration-oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.

WHITE BOX TESTING:


This testing is also called as Glass box testing. In this testing, by knowing the specific functions
that a product has been design to perform test can be conducted that demonstrate each function is
fully operational at the same time searching for errors in each function. It is a test case design
method that uses the control structure of the procedural design to derive test cases. Basis path
testing is a white box testing. Basis path testing:
● Flow graph notation
● Cyclometric complexity
● Deriving test cases
● Graph matrices Control

BLACK BOX TESTING:


In this testing by knowing the internal operation of a product, test can be conducted to ensure
that “all gears mesh”, that is the internal operation performs according to specification and all
internal components have been adequately exercised. It fundamentally focuses on the functional
requirements of the software. The steps involved in black box test case design are:
● Graph based testing methods
● Equivalence partitioning
● Boundary value analysis
● Comparison testing

28
CHAPTER 7
7.1 SUMMARY:
● To conclude the description about the project. The project developed using java and
android studio is based on the requirement specification of the user and the analysis of the
existing system, with flexibility for future enhancement.

● We achieved final accuracy of 95.0% on our data set. We have improved our prediction
after implementing two layers of algorithms wherein we have verified and predicted
symbols which are more similar to each other.

● This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.

● We have achieved an accuracy of 95.8% in our model using only layer 1 of our
algorithm, and using the combination of layer 1 and layer 2 we achieve an accuracy of
98.0%, which is a better accuracy then most of the current research papers on speech
recognition.
● They also used CNN for their recognition system. One thing should be noted that our
model doesn’t uses any background subtraction algorithm whiles some of the models
present above do that.

29
● So, once we try to implement background subtraction in our project the accuracies may
vary. On the other hand, most of the above projects use Kinect devices but our main aim
was to create a project which can be used with readily available resource

7.2 FUTURE ENHANCEMENTS:


● We As per the statistics users are only interested in applications that have good and
creative graphical user interface so we have plans of enhancing good graphical interface.
● As our application have translator feature, we wanted to include a greater number of
languages so large group of people who know different languages can use it.
● In future quick suggestion feature can be added to this application.
● We are also thinking of improving the Pre-Processing to predict voice in noisy conditions
with a higher accuracy.
● This project can be enhanced by being built as a web/mobile application for the users to
conveniently access the project. Also, the existing project only works for it can be
extended to work for other native speech recognitions with the right amount of data set
and training. This project implements a finger spelling translator; however, speech
recognitions are also spoken in a contextual basis where each gesture could represent an
object, or verb.
● Speech recognition System has been developed from classifying only static signs and
alphabets, to the system that can successfully recognize dynamic movements that comes
in continuous sequences of images. Researcher nowadays are paying more attention to
make a large vocabulary for speech recognition systems.

30
CHAPTER 8
8.1 SCREENSHOTS:
HOMEPAGE:

31
8.1.1 HOMEPAGE

This is the first page of our application.

● The three modules of our application are displayed here.


● When clicking on each module transfers to particular module.

32
MODULE 1-TEXT TO SPEECH:

8.1.2 TEXT TO SPEECH CONVERSION PAGE

33
• On clicking text to speech module, it navigates to next activity.
• Enter the text you want to speak on the text box and click the speak button
• On clicking clear button, it clears the text entered in text box The output will be
displayed through mobile speakers

MODULE 2-SPEECH TO TEXT:

34
8.1.3 SPEECH TO TEXT CONVERSION PAGE

On clicking speech to text, it navigates to next activity.


● If the microphone is not enabled it request permission to enable the microphone
for first time.

35
● On clicking the starter button, it allows you to record the audio.
● On clicking the stoper button, it stops and display the recorded audio as text.

MODULE 3-TRANSLATOR:

36
8.1.4 TRANSLATOR PAGE

• On clicking TRANSLATOR, it navigates to next activity.


 In the given text box enter the text that you want to translate or paste the text that you want
to translate
 Click the translate button the data given in text box will be displayed below.

8.2 CODING:

XML CODING FOR HOMEPAGE:

# Importing the Libraries Required

import <?xml version="1.0" encoding="utf-8"?>

<androidx.constraintlayout.widget.ConstraintLayout

xmlns:android="http://schemas.android.com/apk/res/andro

37
id" xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

tools:context=".SpeechToTextActivity">

<TextView

android:id="@+id/textview"

android:layout_width="253dp"

android:layout_height="60dp"

android:layout_marginTop="84dp"

android:gravity="center"

android:text="@string/app_name"

android:textSize="24sp"

android:textColor="@color/black"

app:layout_constraintEnd_toEndOf="parent"

38
app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toTopOf="parent" />

<Button

android:id="@+id/button" android:layout_width="317dp"

android:layout_height="74dp"

android:layout_marginTop="96dp"

android:gravity="center" android:text="TEXT-TO-

SPEECH" android:textSize="18sp"

android:onClick="textToSpeechOnclick"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.553"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/textview" /

>

39
<Button

android:id="@+id/button2"

android:layout_width="317dp"

android:layout_height="74dp"

android:layout_marginTop="52dp"

android:gravity="center"

android:text="SPEECH-TO-TEXT"

android:textSize="18sp"

android:onClick="speechToTextOn

click"

app:layout_constraintEnd_toEndOf

="parent"

app:layout_constraintHorizontal_bi

as="0.553"

app:layout_constraintStart_toStartO

f="parent"

app:layout_constraintTop_toBottom

Of="@+id/button" />

40
<Button

android:id="@+id/button3"

android:layout_width="317dp"

android:layout_height="74dp"

android:layout_marginTop="60dp"

android:gravity="center" android:text="TRANSLATOR"

android:textSize="18sp"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/button2" /

>

</androidx.constraintlayout.widget.ConstraintLayout>

41
CONNECTIVITY CODE FROM ONE MODULE TO ANOTHER:

# Importing the Libraries Required

import androidx.appcompat.app.AppCompatActivity;

import android.content.Intent; import android.os.Bundle;

import android.view.View; public class FirstPageActivity

extends AppCompatActivity

@Override protected void onCreate(Bundle

savedInstanceState)

{ super.onCreate(savedInstanceState)

setContentView(R.layout.first_page);

} public void

speechToTextOnclick(View view)

Intent i = new Intent(getApplicationContext(), SpeechToTextActivity.class);

startActivity(i);

} public void

textToSpeechOnclick(View view)

Intent i = new Intent(getApplicationContext(), TextToSpeechActivity.class);


startActivity(i);

42
} public void translatorOnclick(View

view)

Intent i = new Intent(getApplicationContext(), Translator.class);

startActivity(i);

XML CODE FOR ANDROID MANIFEST PERMISSION:

<?xmlversion="1.0"encoding="utf-8"?>

43
<manifestxmlns:android="http://schemas.android.com/apk/res/android"

package="com.example.speechtotext">

<uses-permission

android:name="android.permission.RECORD_AUDIO"/>

<uses-permission

android:name="android.permission.INTERNET"/>

<application android:allowBackup="true"

android:icon="@mipmap/ic_launcher"

android:label="@string/app_name"

android:roundIcon="@mipmap/ic_launcher_round

" android:supportsRtl="true"

android:theme="@style/Theme.Speechtotext">

<activity android:name=".SpeechToTextActivity">

<intent-filter> <category

android:name="android.intent

. category.LAUNCHER" />

</intent-filter>

</activity>

<activity android:name=".TextToSpeechActivity">

<intent-filter> <category

android:name="android.intent

. category.LAUNCHER" />

</intent-filter>
44
</activity>

<activity android:name=".FirstPageActivity">

<intent-filter> <action

android:name="android.intent

action.MAIN" /> <category

android:name="android.intent

. category.LAUNCHER" />

</intent-filter>

</activity>

</application>

</manifest>

TEXT-TO-SPEECH XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<LinearLayout

xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent" android:orientation="vertical" a

45
ndroid:padding="20dp" android:gravity="center"

tools:context=".TextToSpeechActivity">

<EditText

android:layout_width="match_parent"

android:layout_height="wrap_content"

android:id="@+id/et_input"

android:hint="Enter The Text"

android:textAlignment="center"

android:gravity="center_horizontal"

android:lines="5"

android:background="@drawable/bg_round"/>

<LinearLayout

android:layout_width="match_parent"

android:layout_height="wrap_content"

android:layout_marginTop="10dp">

<Button

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_weight="1"

android:id="@+id/bt_convert"

android:text="speak"

/>
46
<androidx.appcompat.widget.AppCompatSpinner

android:layout_width="10dp"

android:layout_height="wrap_content"/>

<Button

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_weight="1"

android:id="@+id/bt_clear"

android:text="clear"

/>

</LinearLayout>

TEXT-TO-SPEECH JAVA CODING:

import android.os.Bundle; import

android.speech.tts.TextToSpeech; import android.view.View;

import android.widget.Button; import android.widget.EditText;

import androidx.appcompat.app.AppCompatActivity; import

java.util.Locale; public class TextToSpeechActivity extends

AppCompatActivity {

EditText edtext; Button btconvert,btclear;

android.speech.tts.TextToSpeech

textToSpeech;

47
@Override protected void onCreate(Bundle

savedInstanceState)

super.onCreate(savedInstanceState); setContentView(R.layout.text_to_speech);

edtext = findViewById(R.id.et_input); btconvert =

findViewById(R.id.bt_convert); btclear = findViewById(R.id.bt_clear);

textToSpeech = new

android.speech.tts.TextToSpeech(getApplicationContext()

, new android.speech.tts.TextToSpeech.OnInitListener() {

@Override

public void onInit(int status)

{ if (status ==

android.speech.tts.TextToSpeech.SUCCESS)

int lang = textToSpeech.setLanguage(Locale.ENGLISH);}

});

btconvert.setOnClickListener(new View.OnClickListener()

@Override public void

onClick(View v)

48
String s = edtext.getText().toString(); int speech =

textToSpeech.speak(s,

android.speech.tts.TextToSpeech.QUEUE_FLUSH, null);}

}); btclear.setOnClickListener(new

View.OnClickListener() {

@Override public void

onClick(View v)

{ edtext.setText("");

});

}}

SPEECH-TO-TEXT XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<androidx.constraintlayout.widget.ConstraintLayout

xmlns:android="http://schemas.android.com/apk/res/andro

id" xmlns:app="http://schemas.android.com/apk/res-auto"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

android:layout_height="match_parent"

tools:context=".SpeechToTextActivity">

<TextView

49
android:id="@+id/output"

android:layout_width="300dp"

android:layout_height="80dp"

android:layout_marginTop="144dp"

android:gravity="center"

android:textColor="#0C0C0C" android:textSize="22sp"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.495"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toTopOf="parent" />

<Button android:id="@+id/rec" android:layout_width="wrap_content"

android:layout_height="wrap_content" android:layout_marginTop="60dp"

android:onClick="startRec" android:text="startRec"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/output"

tools:ignore="OnClick" />

<Button

android:id="@+id/stop" android:layout_width="108dp"

android:layout_height="48dp"

android:layout_marginTop="44dp"

50
android:onClick="stopRec" android:text="stopRec"

app:layout_constraintEnd_toEndOf="parent"

app:layout_constraintHorizontal_bias="0.512"

app:layout_constraintStart_toStartOf="parent"

app:layout_constraintTop_toBottomOf="@+id/rec"

tools:ignore="OnClick" />

</androidx.constraintlayout.widget.ConstraintLayout>

SPEECH-TO-TEXT JAVA CODING:

import androidx.appcompat.app.AppCompatActivity; import

androidx.core.app.ActivityCompat; import

androidx.core.content.ContextCompat; import

android.Manifest; import android.content.Intent; import

android.content.pm.PackageManager; import

android.os.Bundle; import android.speech.RecognitionListener;

import android.speech.RecognizerIntent; import

android.speech.SpeechRecognizer; import android.view.View;

import android.widget.TextView; import java.util.ArrayList;

public class SpeechToTextActivity extends AppCompatActivity

TextView txt;

SpeechRecognizer recognizer; Intent intent;


51
@Override protected void onCreate(Bundle

savedInstanceState)

{ super.onCreate(savedInstanceState);

setContentView(R.layout.speech_to_text);

checkpermission(); convert(); txt =

findViewById(R.id.output);

System.out.println("Inside on create");

} public void

checkpermission()

{ if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,

Manifest.permission.RECORD_AUDIO)== PackageManager.PERMISSION_GRANTED))

ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[] {

Manifest.permission.RECORD_AUDIO},1);

} if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,

Manifest.permission.INTERNET)== PackageManager.PERMISSION_GRANTED))

ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[] {

Manifest.permission.INTERNET},1);

} } public void

convert()

52
recognizer=SpeechRecognizer.createSpeechRecognizer(SpeechToTextActivity.this);

intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

recognizer.setRecognitionListener(new RecognitionListener()

@Override public void

onReadyForSpeech(Bundle params)

System.out.println("Inside on ready for speech");

@Override public void

onBeginningOfSpeech()

System.out.println("Inside on Beginning of speech");

@Override public void

onRmsChanged(float rmsdB)

System.out.println("Inside on rms changed");

@Override public void

onBufferReceived(byte[] buffer)

System.out.println("Inside on Buffer received");

53
@Override public void

onEndOfSpeech()

System.out.println("Inside on End of speech");

@Override

public void onError(int error)

System.out.println("Inside on Error"+ error);

@Override public void

onResults(Bundle results)

System.out.println("Inside on Results"); ArrayList<String>

words=results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);

for(String word: words) System.out.println(word); if(words!=null)

{ txt.setText(words.get(0));

@Override public void onPartialResults(Bundle

partialResults)

System.out.println("Inside on Partial Results");

54
}

@Override public void onEvent(int eventType,

Bundle params)

System.out.println("Inside on event");

}); } public void startRec(View

view)

System.out.println("Inside start rec");

txt.setText(""); recognizer.startListening(intent);

} public void stopRec(View

view)

{ recognizer.stopListening();

TRANSLATOR XML CODING:

<?xml version="1.0" encoding="utf-8"?>

<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"

xmlns:tools="http://schemas.android.com/tools"

android:layout_width="match_parent"

55
android:layout_height="match_parent"

android:orientation="vertical"

tools:context=".MainActivity">

<EditText

android:id="@+id/inputToTranslate"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_gravity="center"

android:layout_marginTop="48dp"

android:layout_marginBottom="16dp"

android:ems="10" android:hint="Enter text"

android:inputType="text" />

<Button

android:id="@+id/translateButton"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

56
android:layout_gravity="center"

android:layout_marginBottom="32dp"

android:text="Translate" />

<TextView

android:id="@+id/translatedTv"

android:layout_width="wrap_content"

android:layout_height="wrap_content"

android:layout_gravity="center" android:textSize="16sp"

/>

</LinearLayout>

MANIFEST PERMISSION:

<?xml version="1.0" encoding="utf-8"?>

<manifest xmlns:android="http://schemas.android.com/apk/res/android">

<uses-permission android:name="android.permission.INTERNET"/>

<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE"/> <!--

Activities below as usual-->

</manifest>

57
TRANSLATOR JAVA CODING:

public class MainActivity extends AppCompatActivity

private EditText

inputToTranslate; private

TextView translatedTv; private

String originalText; private

String translatedText; private

boolean connected;

Translate translate;

@Override protected void onCreate(Bundle

savedInstanceState)

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main); inputToTranslate =

findViewById(R.id.inputToTranslate); translatedTv =

findViewById(R.id.translatedTv); Button translateButton =

findViewById(R.id.translateButton);

translateButton.setOnClickListener(new

View.OnClickListener()

58
@Override public void

onClick(View v)

if (checkInternetConnection())

//If there is internet connection, get translate service and start translation:

getTranslateService(); translate();

else

//If not, display "no connection" warning:

translatedTv.setText(getResources().getString(R.string.no_connection));

}); } public void

getTranslateService()

StrictMode.ThreadPolicy policy = new

StrictMode.ThreadPolicy.Builder().permitAll().build();

StrictMode.setThreadPolicy(policy); try (InputStream is =

getResources().openRawResource(R.raw.credentials)) {

//Get credentials:
59
final GoogleCredentials myCredentials = GoogleCredentials.fromStream(is);

//Set credentials and get translate service: TranslateOptions

translateOptions=

TranslateOptions.newBuilder().setCredentials(myCredentials).build();

translate = translateOptions.getService();

} catch (IOException

ioe)

{ ioe.printStackTrace()

} } public void

translate()

//Get input text to be translated: originalText =

inputToTranslate.getText().toString(); Translation translation=

translate.translate(originalText,Translate.TranslateOption.targetLanguage("t

r"),

Translate.TranslateOption.model("base")); translatedText = translation.getTranslatedText();

//Translated text and original text are set to TextViews:

translatedTv.setText(translatedText);

} public boolean

checkInternetConnection()

{
60
//Check internet connection:

ConnectivityManager connectivityManager = (ConnectivityManager)

getSystemService(Context.CONNECTIVITY_SERVICE);

//Means that we are connected to a network (mobile or wi-fi) connected =

connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_MOBILE).getState

() == NetworkInfo.State.CONNECTED ||

connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_WIFI).getState()

== NetworkInfo.State.CONNECTED;

return connected;

61
8.3 DATA DICTIONARY

Global vocabulary :
Support your global user base with Speech-to-Text’s extensive language support in over 125
languages and variants.

Streaming speech recognition:


Receive real-time speech recognition results as the API processes the audio input streamed from
your application’s microphone or sent from a prerecorded audio file (inline or through Cloud
Storage).

Speech adaptation:
Customize speech recognition to transcribe domain-specific terms and rare words by providing
hints and boost your transcription accuracy of specific words or phrases. Automatically convert
spoken numbers into addresses, years, currencies, and more using classes. Speech-to-Text
On-Prem :
Have full control over your infrastructure and protected speech data while leveraging Google’s
speech recognition technology on-premises, right in your own private data centers. Contact sales
to get started.

Multichannel recognition:
Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference)
and annotate the transcripts to preserve the order.

Noise robustness:
Speech-to-Text can handle noisy audio from many environments without requiring additional
noise cancellation.

Domain-specific models:
Choose from a selection of trained models for voice control and phone call and video transcription
optimized for domain-specific quality requirements. For example, our enhanced phone call model
is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling
rate.

62
Content filtering:
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and
filter out profane words in text results.

Transcription evaluation :
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on
your configuration.

Automatic punctuation (beta):


Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).

Speaker diarylation (beta):


Know who said what by receiving automatic predictions about which of the speakers in a
conversation spoke each utterance.

63
CHAPTER 9

64
BIBLIOGRAPHY AND REFERENCES:

1. Oviatt S. Predicting Spoken Disfluencies During Human-Computer Interaction. Computer


Speech and Language 9(1):19-35, January 1995.
2. http://www.igntu.ac.in/eContent/IGNTU-eContent-815947141046-MA-Linguistics-
4HarjitSingh-ComputationalLinguistics-
3. Marsh E, Wauchope K, Gurney JO. Human-Machine Dialogue for Multi-Modal Decision
Support Systems. Technical Report AIC-94-032, NCARAI, US Naval Research Laboratory,
Washington, DC.
4. pdfhttps://www.researchgate.net/publication/
304651244_VOICE_RECOGNITION_SYSTEM_S PEECH-TO-TEXT
5. Cohen, PR. The Role of Natural Language in a Multimodal Interface. Proceedings of the ACM
Symposium on User Interface Software and Technology, Monterey California, ACM Press,
November 15-18, 1992.
6. https://redirect.cs.umbc.edu/~mgrass2/dissert/annbib.html#speech
7. https://en.wikipedia.org/wiki/Speech_recognition

8. https://github.com/topics/speech-to-text
9. http://www.ling.helsinki.fi/~gwilcock/Tartu-2003/L7-
Speech/JSAPI/Recognition.html#:~:text=A%20speech%20recognizer%20is%20a,of%20support
ing%20classes%20and%20interfaces.

65

You might also like