Ranjith S - Mini Project
Ranjith S - Mini Project
(PCA20P02L)
ON
By
RANJITH S
(RA2132241020088)
Submitted to the
Ramapuram, Chennai.
NOVEMBER 2022
COLLEGE OF SCIENCE & HUMANITIES
Ramapuram, Chennai.
BONAFIDE CERTIFICATE
Certified that this project report titled “SPEECH AND TEXT RECOGNITION”
is the bonafide work of RANJITH S (Reg No:RA2132241020088) who carried
out the Mini project work done under my supervision.
1 1.1PROJECT INTRODUCTION 2
WORKING ENVIRONMENT
2
2.2 OBJECTIVES: 6
2.3 PROJECT CATEGORY: 7
2.4TOOLS/ENVIRONMENT:
1)Introduction: 10
11
3 11
3.1.1 Purpose:
12
3.1.2. Scope:
13
3.1.3 Benefits:
3.1.4 Abbreviations:
3.1.5 References:
SYSTEM DESIGN
1
4 4.1 GIANT CHART:
1
1
4.2 USE CASE DIAGRAM
PROJECT DESCRIPTION
22
5 5.1 2
Produc
5.3 IMPLEMENTATION
SYSTEM TESTING
26
6.1 TESTING DEFINITON
2
6
6.2 TESTING OBJECTIVE
7.1 SUMMARY 3
7
7.2 FUTURE ENHANCEMENTS
APPENDIX
8.1 SCREENSHOTS
8
8.2 CODING
TRANSLATOR PAGE
8.1.3 36
8.1.4 37
CHAPTER 1
INTRODUCTION
1
1.1.1 SPEECH AND TEXT CONVERSION
Some speech recognition systems require "training" (also called "enrollment") where an
individual speaker reads text or isolated vocabulary into the system. The system analyzes the
person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting
in increased accuracy. Systems that do not use training are called "speaker-independent"[1]
systems. Systems that use training are called "speaker dependent”. Speech recognition
applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing
(e.g. "I would like to make a collect call"), demotic appliance control, search key words (e.g.
find a podcast where particular words were spoken), simple data entry (e.g., entering a credit
card number), preparation of structured documents (e.g. a radiology report), determining
speaker characteristics, speech-to-text processing (e.g., word processors or emails), and aircraft
(usually termed direct voice input).
2
CHAPTER 2
3
WORKING ENVIRONMENT
• Java 19
• Frontend XML
• Android Studio
REQUIREMENT ANALYSIS
Requirements are a feature of a system or description of something that the system is capable of
doing in order to fulfil the system’s purpose. It provides the appropriate mechanism for
understanding what the customer wants, analyzing the needs assessing feasibility, negotiating a
reasonable solution, specifying the solution unambiguously, validating the specification and
managing the requirements as they are translated into an operational system.
JAVA
4
Java is a class-based, object-oriented programming language that is designed to have as few
implementation dependencies as possible. It is a general-purpose programming language intended
to let application developers write once, run anywhere (WORA), meaning that compiled Java code
can run on all platforms that support Java without the need for recompilation. Java applications are
typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of the
underlying computer architecture. The syntax of Java is similar to C and C++, but has fewer
lowlevel facilities than either of them. The Java runtime provides dynamic capabilities (such as
reflection and runtime code modification) that are typically not available in traditional compiled
languages. As of 2019, Java was one of the most popular programming languages in use according
to particularly for client-server web applications, with a reported 9 million developers. Java was
originally developed by James Gosling at Sun Microsystems (which has since been acquired by
Oracle) and released in 1995 as a core component of Sun Microsystems' Java platform. The
original and reference implementation Java compilers, virtual machines, and class libraries were
originally released by Sun under proprietary licenses. As of May 2007, in compliance with the
specifications of the Java Community Process, Sun had relicensed most of its Java technologies
under the GNU General Public License. Oracle offers its own Hotspot Java Virtual Machine;
however, the official reference implementation is the OpenJDK JVM which is free open-source
software and used by most developers and is the default JVM for almost all Linux distributions.
Features in Java
One of the biggest reasons why Java is so popular is the platform independence. Programs can run
on several different types of computers; as long as the computer has a Java Runtime Environment
(JRE) installed, a Java program can run on it. Most types of computers will be compatible with a
JRE including PCs running on Windows, Macintosh computers, Unix or Linux computers, and
large mainframe computers, as well as mobile phones. Since it has been around for so long, some
of the biggest organizations in the world are built using the language. Many banks, retailers,
insurance companies, utilities, and manufacturers all use Java.
ANDROID STUDIO
5
Android Studio is the official integrated development environment (IDE) for Google's Android
operating system, built on JetBrains' IntelliJ IDEA software and designed specifically for
Android development. It is available for download on Windows, macOS and Linux based
operating systems or as a subscription-based service in 2020. It is a replacement for the Eclipse
Android Development Tools (E-ADT) as the primary IDE for native Android application
development. Android Studio was announced on May 16, 2013 at the Google I/O conference. It
was in early access preview stage starting from version 0.1 in May 2013, then entered beta
stage starting from version 0.8 which was released in June 2014. The first stable build was
released in December 2014, starting from version 1.0.
6
CHAPTER 3
7
SYSTEM ANALYSIS
3.1 FEASIBILITY STUDY
The feasibility of the project is analyzed in this phase and business proposal is put forth N with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is not
a burden to the person. For feasibility analysis, some understanding of the major requirements for
the system is essential. Three key considerations involved in the feasibility analysis are
• Economic Feasibility
• Technical Feasibility
• Social Feasibility
• These applications actually don’t help this community in any way, these apps are just
named in the prefix of deaf and dumb but these things actually just increase the volume,
this is not going to help them in anyway.
9
• Most of the existing system have the title as deaf and dumb assistance but most of the
existing systems do not serve the relevant services that has to be served in this title, but
our system provides you the perfect services which has to be served with this Title.
Phase-1: Train the Convolutional Neural Network (CNN) with the train dataset.
Phase-2: Store the obtained weights and parameters in file system. Phase-
10
3.4 PROPOSED SYSTEM
We achieved final accuracy of 95.0% on our data set. We have improved our prediction after
implementing two layers of algorithms wherein we have verified and predicted symbols
which are more similar to each other.
This gives us the ability to detect almost all the symbols provided that they are shown properly,
there is no noise in the background and lighting is adequate.
11
person but future systems may be developed that could communicate to the mute person’s
mobile device, allowing the system to learn the needs of the user, thereby provisioning the
development of recommendatory systems as they have the relevant data related to the mute
person that can easily be learned thought he neural network model.
12
CHAPTER 4
13
SYSTEM DESIGN
● Systems design is the process of defining elements of a system like modules, architecture,
components and their interfaces and data for a system based on the specified
requirements. It is the process of defining, developing and designing systems which
satisfies the specific needs and requirements of a business or organization.
● A systemic approach is required for a coherent and well-running system. Bottom-Up or
Top-Down approach is required to take into account all related variables of the system. A
designer uses the modelling languages to express the information and knowledge in a
structure of system that is defined by a consistent set of rules and definitions. The designs
can be defined in graphical or textual modelling languages.
DESIGN METHODS:
● Architectural design: To describes the views, models, behavior, and structure of the
system.
● Logical design: To represent the data flow, inputs and outputs of the system. Example:
ER Diagrams (Entity Relationship Diagrams).
● Physical design: Defined as a) How users add information to the system and how the
system represents information back to the user. b) How the data is modelled and stored
within the system. c) How data moves through the system, how data is validated, secured
and/or transformed as it flows through and out of the system.
4.1 DATA FLOW DIAGRAM
DFDs make it easy to depict the business requirements of applications by representing the
sequence of process steps and flow of information using a graphical representation or visual
representation rather than a textual description. When used through an entire development
process, they first document the results of business analysis. Then, they refine the
representation to show how information moves through, and is changed by, application
flows.
Both automated and manual processes are represented.
14
DATAFLOW DIAGRAM
Use-case diagrams describe the high-level functions and scope of a system. These diagrams also
identify the interactions between the system and its actors. The use cases and actors in use-case
diagrams describe what the system does and how the actors use it, but not how the system
operates internally. Use-case diagrams illustrate and define the context and requirements of
either an entire system or the important parts of the system. You can model a complex system
with a single use- case diagram, or create many use-case diagrams to model the components of
the system. You would typically develop use-case diagrams in the early phases of a project and
refer to them throughout the development process.
15
USECASE DIAGRAM
16
4.2.2 SPEECH TO TEXT
An architecture diagram is a visual representation of all the elements that make up part, or all, of
a system. Above all, it helps the engineers, designers, stakeholders and anyone else involved in
the project understand a system or app’s layout. This diagram gives a top-level view of a
software’s structure. To elaborate, it generally includes various components that interact with
each other and how the software interacts with external databases and servers. It’s useful for
17
explaining software to clients and stakeholders; and assessing the impact of adding new features
or upgrading, replacing, or merging existing applications.
18
4.3.2 ARCHITECTURE DIAGRAM 2
19
CHAPTER 5
20
CHAPTER 5
PROJECT DESCRIPTION:
5.1 OBJECTIVE:
● To Statistics show that there are currently over 325, 000 health-related mobile apps now
presented on app marketplaces.
● As per the statistics healthcare app developers, are keen on developing projects like
fitness app, calorie burning tracker, online pharmacy apps, online doctor consulting apps
are the areas shown interest in developing.
● There are only very few developers who develop these health app for benefit of the
person in need so these apps are in huge demand in the market.
● Module 1 – Text-to-Speech
● Module 2 – Speech-to-Text
● Module 3 – Translator
21
• The synthesizer will transform the collected and arranged data into waveform, Now the wave
form will be given as output through the phone speakers.
MODULE 2 – SPEECH-TO-TEXT:
• Speech-To-Text is the second module, this module will direct you to the next page where a
listen button will be given. You can hold the button and speak the data that has to be
displayed as output.
• The audio recorded will be analyzed, the recorded audio will be braked down into lines. The
start and end of the audio will be analyzed and found.
• From the recorded audio, the noise will be removed and matched with correct corresponding
words.
• This converted waveform will be converted as text and displayed through our mobile screen.
MODULE 3 – TRANSLATOR:
• Translator is the third module; this module will help us to convert the output form the other
modules.
• This module uses the google translate API, through which the output from the other modules
can be translated from one language to other.
5.3 IMPLEMENTATION
Project implementation is the process of putting a project plan into action to produce the
deliverables, otherwise known as the products or services, for clients or stakeholders. It takes
place after the planning phase, during which a team determines the key objectives for the project,
as well as the timeline and budget. Implementation involves coordinating resources and
22
measuring performance to ensure the project remains within its expected scope and budget. It
also involves handling any unforeseen issues in a way that keeps a project running smoothly.
To implement a project effectively, project managers must consistently communicate with a team
to set and adjust priorities as needed while maintaining transparency about the project's status
with the clients or any key stakeholders. Implementation is the stage in the project where the
theoretical design is turned into a working system and is giving confidence on the new system
for the users that it will work efficiently and effectively. It involves careful planning,
investigation of the current system and its constraints on implementation, design of methods to
achieve the changeover, an evaluation of change over methods. Apart from planning major task
of preparing the implementation are education and training of users. The implementation process
begins with preparing a plan for the implementation of the system. According to this plan, the
activities are to be carried out, discussions made regarding the equipment and resources and the
additional equipment has to be acquired to implement the new system. In network backup system
no, additional resources are needed. Implementation is the final and the most important phase.
The most critical stage in achieving a successful new system is giving the users confidence that
the new system will work and be effective. The system can be implemented only after thorough
testing is done and if it is found to be working according to the specification.
23
CHAPTER 6
24
SYSTEM TESTING:
● Testing is a process of executing a program with the intent of finding an error. A good
test case is one that has a high probability of finding an as-yet –undiscovered error. A
successful test is one that uncovers an as-yet- undiscovered error. System testing is the
stage of implementation, which is aimed at ensuring that the system works accurately
and efficiently as expected before live operation commences. It verifies that the whole
set of programs hang together. System testing requires a test consists of several key
activities and steps for run program, string, system and is important in adopting a
successful new system. This is the last chance to detect and correct errors before the
system is installed for user acceptance testing.
● The software testing process commences once the program is created and the
documentation and related data structures are designed. Software testing is essential for
correcting errors. Otherwise, the program or the project is not said to be complete.
Software testing is the critical element of software quality assurance and represents the
ultimate the review of specification design and coding. Testing is the process of
executing the program with the intent of finding the error. A good test case design is one
that as a probability of finding a yet undiscovered error. A successful test is one that
uncovers a yet undiscovered error.
● The purpose of testing is to discover errors. Testing is the process of trying to discover
every conceivable fault or weakness in a work product. It provides a way to check the
functionality of components, sub – assemblies or finished product it is the process of
exercising software with the intent of ensuring that the software and does not fail in an
unacceptable manner. System testing is the stage of implementation, which is aimed at
ensuring that the system works accurately and efficiently before live operation
commences. Testing is the process of executing the program with the intent of finding
errors and missing operations and also a complete verification to determine whether the
objectives are met and the user requirements are satisfied. The ultimate aim is Quality
assurance.
25
6.2 TESTING OBJECTIVE:
● We To find errors in the developed software. To check the working of the function is
according to the specification. Their behavior and performance required are fulfilled. To
check the reliability and quality of the software.
● We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.
● The prediction layer estimates how likely the image will fall under one of the classes. So,
the output is normalized between 0 and 1 and such that the sum of each value in each
class sums to 1. We have achieved this using SoftMax function.
● At first the output of the prediction layer will be somewhat far from the actual value. To
make it better we have trained the networks using labelled data. The cross-entropy is a
performance measurement used in the classification. It is a continuous function which is
positive at values which is not same as labelled value and is zero exactly when it is equal
to the labelled value. Therefore, we optimized the cross-entropy by minimizing it as
close to zero. To do this in our network layer we adjust the weights of our neural
networks. TensorFlow has an inbuilt function to calculate the cross entropy.
● As we have found out the cross-entropy function, we have optimized it using Gradient
Descent in fact with the best gradient descent optimizer is called Adam Optimizer
● Unit testing
● Integration testing
● Functional testing
● System testing
● White box testing
● Black box testing
UNIT TESTING:
● Unit testing is conducted to verify the functional performance of each modular
component of the software. Unit testing focuses on the smallest unit of the software
26
design (i.e.), the module. The white-box testing techniques were heavily employed for
unit testing.
● Unit tests perform basic tests at component level and test a specific business process,
application, and/or system configuration. Unit tests ensure that each unique path of a
business process performs accurately to the documented specifications and contains
clearly defined inputs and expected results.
INTEGRATION TESTING:
● Integration testing is a systematic technique for construction the program structure while
at the same time conducting tests to uncover errors associated with interfacing. i.e.,
integration testing is the complete testing of the set of modules which makes up the
product. The objective is to take untested modules and build a program structure tester
should identify critical modules. Critical modules should be tested as early as possible.
One approach is to wait until all the units have passed testing, and then combine them
and then tested. This approach is evolved from unstructured testing of small programs.
Another strategy is to construct the product in increments of tested units. A small set of
modules are integrated together and tested, to which another module is added and tested
in combination. And so on. The advantages of this approach are that, interface dispenses
can be easily found and corrected.
FUNCTIONAL TESTS:
● Functional test cases involved exercising the code with nominal input values for which
the expected results are known, as well as boundary values and special values, such as
logically related inputs, files of identical elements, and empty files.
● Three types of tests in Functional test:
i. Performance Test
ii. Stress Test iii. Structure Test
SYSTEM TEST:
27
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system test is the
configuration-oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
28
CHAPTER 7
7.1 SUMMARY:
● To conclude the description about the project. The project developed using java and
android studio is based on the requirement specification of the user and the analysis of the
existing system, with flexibility for future enhancement.
● We achieved final accuracy of 95.0% on our data set. We have improved our prediction
after implementing two layers of algorithms wherein we have verified and predicted
symbols which are more similar to each other.
● This gives us the ability to detect almost all the symbols provided that they are shown
properly, there is no noise in the background and lighting is adequate.
● We have achieved an accuracy of 95.8% in our model using only layer 1 of our
algorithm, and using the combination of layer 1 and layer 2 we achieve an accuracy of
98.0%, which is a better accuracy then most of the current research papers on speech
recognition.
● They also used CNN for their recognition system. One thing should be noted that our
model doesn’t uses any background subtraction algorithm whiles some of the models
present above do that.
29
● So, once we try to implement background subtraction in our project the accuracies may
vary. On the other hand, most of the above projects use Kinect devices but our main aim
was to create a project which can be used with readily available resource
30
CHAPTER 8
8.1 SCREENSHOTS:
HOMEPAGE:
31
8.1.1 HOMEPAGE
32
MODULE 1-TEXT TO SPEECH:
33
• On clicking text to speech module, it navigates to next activity.
• Enter the text you want to speak on the text box and click the speak button
• On clicking clear button, it clears the text entered in text box The output will be
displayed through mobile speakers
34
8.1.3 SPEECH TO TEXT CONVERSION PAGE
35
● On clicking the starter button, it allows you to record the audio.
● On clicking the stoper button, it stops and display the recorded audio as text.
MODULE 3-TRANSLATOR:
36
8.1.4 TRANSLATOR PAGE
8.2 CODING:
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/andro
37
id" xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".SpeechToTextActivity">
<TextView
android:id="@+id/textview"
android:layout_width="253dp"
android:layout_height="60dp"
android:layout_marginTop="84dp"
android:gravity="center"
android:text="@string/app_name"
android:textSize="24sp"
android:textColor="@color/black"
app:layout_constraintEnd_toEndOf="parent"
38
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
<Button
android:id="@+id/button" android:layout_width="317dp"
android:layout_height="74dp"
android:layout_marginTop="96dp"
android:gravity="center" android:text="TEXT-TO-
SPEECH" android:textSize="18sp"
android:onClick="textToSpeechOnclick"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.553"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/textview" /
>
39
<Button
android:id="@+id/button2"
android:layout_width="317dp"
android:layout_height="74dp"
android:layout_marginTop="52dp"
android:gravity="center"
android:text="SPEECH-TO-TEXT"
android:textSize="18sp"
android:onClick="speechToTextOn
click"
app:layout_constraintEnd_toEndOf
="parent"
app:layout_constraintHorizontal_bi
as="0.553"
app:layout_constraintStart_toStartO
f="parent"
app:layout_constraintTop_toBottom
Of="@+id/button" />
40
<Button
android:id="@+id/button3"
android:layout_width="317dp"
android:layout_height="74dp"
android:layout_marginTop="60dp"
android:gravity="center" android:text="TRANSLATOR"
android:textSize="18sp"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/button2" /
>
</androidx.constraintlayout.widget.ConstraintLayout>
41
CONNECTIVITY CODE FROM ONE MODULE TO ANOTHER:
import androidx.appcompat.app.AppCompatActivity;
extends AppCompatActivity
savedInstanceState)
{ super.onCreate(savedInstanceState)
setContentView(R.layout.first_page);
} public void
speechToTextOnclick(View view)
startActivity(i);
} public void
textToSpeechOnclick(View view)
42
} public void translatorOnclick(View
view)
startActivity(i);
<?xmlversion="1.0"encoding="utf-8"?>
43
<manifestxmlns:android="http://schemas.android.com/apk/res/android"
package="com.example.speechtotext">
<uses-permission
android:name="android.permission.RECORD_AUDIO"/>
<uses-permission
android:name="android.permission.INTERNET"/>
<application android:allowBackup="true"
android:icon="@mipmap/ic_launcher"
android:label="@string/app_name"
android:roundIcon="@mipmap/ic_launcher_round
" android:supportsRtl="true"
android:theme="@style/Theme.Speechtotext">
<activity android:name=".SpeechToTextActivity">
<intent-filter> <category
android:name="android.intent
. category.LAUNCHER" />
</intent-filter>
</activity>
<activity android:name=".TextToSpeechActivity">
<intent-filter> <category
android:name="android.intent
. category.LAUNCHER" />
</intent-filter>
44
</activity>
<activity android:name=".FirstPageActivity">
<intent-filter> <action
android:name="android.intent
android:name="android.intent
. category.LAUNCHER" />
</intent-filter>
</activity>
</application>
</manifest>
<LinearLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent" android:orientation="vertical" a
45
ndroid:padding="20dp" android:gravity="center"
tools:context=".TextToSpeechActivity">
<EditText
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:id="@+id/et_input"
android:textAlignment="center"
android:gravity="center_horizontal"
android:lines="5"
android:background="@drawable/bg_round"/>
<LinearLayout
android:layout_width="match_parent"
android:layout_height="wrap_content"
android:layout_marginTop="10dp">
<Button
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_weight="1"
android:id="@+id/bt_convert"
android:text="speak"
/>
46
<androidx.appcompat.widget.AppCompatSpinner
android:layout_width="10dp"
android:layout_height="wrap_content"/>
<Button
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_weight="1"
android:id="@+id/bt_clear"
android:text="clear"
/>
</LinearLayout>
AppCompatActivity {
android.speech.tts.TextToSpeech
textToSpeech;
47
@Override protected void onCreate(Bundle
savedInstanceState)
super.onCreate(savedInstanceState); setContentView(R.layout.text_to_speech);
textToSpeech = new
android.speech.tts.TextToSpeech(getApplicationContext()
, new android.speech.tts.TextToSpeech.OnInitListener() {
@Override
{ if (status ==
android.speech.tts.TextToSpeech.SUCCESS)
});
btconvert.setOnClickListener(new View.OnClickListener()
onClick(View v)
48
String s = edtext.getText().toString(); int speech =
textToSpeech.speak(s,
android.speech.tts.TextToSpeech.QUEUE_FLUSH, null);}
}); btclear.setOnClickListener(new
View.OnClickListener() {
onClick(View v)
{ edtext.setText("");
});
}}
<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/andro
id" xmlns:app="http://schemas.android.com/apk/res-auto"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
android:layout_height="match_parent"
tools:context=".SpeechToTextActivity">
<TextView
49
android:id="@+id/output"
android:layout_width="300dp"
android:layout_height="80dp"
android:layout_marginTop="144dp"
android:gravity="center"
android:textColor="#0C0C0C" android:textSize="22sp"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.495"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toTopOf="parent" />
android:layout_height="wrap_content" android:layout_marginTop="60dp"
android:onClick="startRec" android:text="startRec"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/output"
tools:ignore="OnClick" />
<Button
android:id="@+id/stop" android:layout_width="108dp"
android:layout_height="48dp"
android:layout_marginTop="44dp"
50
android:onClick="stopRec" android:text="stopRec"
app:layout_constraintEnd_toEndOf="parent"
app:layout_constraintHorizontal_bias="0.512"
app:layout_constraintStart_toStartOf="parent"
app:layout_constraintTop_toBottomOf="@+id/rec"
tools:ignore="OnClick" />
</androidx.constraintlayout.widget.ConstraintLayout>
androidx.core.app.ActivityCompat; import
androidx.core.content.ContextCompat; import
android.content.pm.PackageManager; import
TextView txt;
savedInstanceState)
{ super.onCreate(savedInstanceState);
setContentView(R.layout.speech_to_text);
findViewById(R.id.output);
System.out.println("Inside on create");
} public void
checkpermission()
{ if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,
Manifest.permission.RECORD_AUDIO)== PackageManager.PERMISSION_GRANTED))
ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[] {
Manifest.permission.RECORD_AUDIO},1);
} if(!(ContextCompat.checkSelfPermission(SpeechToTextActivity.this,
Manifest.permission.INTERNET)== PackageManager.PERMISSION_GRANTED))
ActivityCompat.requestPermissions(SpeechToTextActivity.this,new String[] {
Manifest.permission.INTERNET},1);
} } public void
convert()
52
recognizer=SpeechRecognizer.createSpeechRecognizer(SpeechToTextActivity.this);
intent=new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
recognizer.setRecognitionListener(new RecognitionListener()
onReadyForSpeech(Bundle params)
onBeginningOfSpeech()
onRmsChanged(float rmsdB)
onBufferReceived(byte[] buffer)
53
@Override public void
onEndOfSpeech()
@Override
onResults(Bundle results)
words=results.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
{ txt.setText(words.get(0));
partialResults)
54
}
Bundle params)
System.out.println("Inside on event");
view)
txt.setText(""); recognizer.startListening(intent);
view)
{ recognizer.stopListening();
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:tools="http://schemas.android.com/tools"
android:layout_width="match_parent"
55
android:layout_height="match_parent"
android:orientation="vertical"
tools:context=".MainActivity">
<EditText
android:id="@+id/inputToTranslate"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center"
android:layout_marginTop="48dp"
android:layout_marginBottom="16dp"
android:inputType="text" />
<Button
android:id="@+id/translateButton"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
56
android:layout_gravity="center"
android:layout_marginBottom="32dp"
android:text="Translate" />
<TextView
android:id="@+id/translatedTv"
android:layout_width="wrap_content"
android:layout_height="wrap_content"
android:layout_gravity="center" android:textSize="16sp"
/>
</LinearLayout>
MANIFEST PERMISSION:
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission android:name="android.permission.INTERNET"/>
</manifest>
57
TRANSLATOR JAVA CODING:
private EditText
inputToTranslate; private
boolean connected;
Translate translate;
savedInstanceState)
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main); inputToTranslate =
findViewById(R.id.inputToTranslate); translatedTv =
findViewById(R.id.translateButton);
translateButton.setOnClickListener(new
View.OnClickListener()
58
@Override public void
onClick(View v)
if (checkInternetConnection())
//If there is internet connection, get translate service and start translation:
getTranslateService(); translate();
else
translatedTv.setText(getResources().getString(R.string.no_connection));
getTranslateService()
StrictMode.ThreadPolicy.Builder().permitAll().build();
getResources().openRawResource(R.raw.credentials)) {
//Get credentials:
59
final GoogleCredentials myCredentials = GoogleCredentials.fromStream(is);
translateOptions=
TranslateOptions.newBuilder().setCredentials(myCredentials).build();
translate = translateOptions.getService();
} catch (IOException
ioe)
{ ioe.printStackTrace()
} } public void
translate()
translate.translate(originalText,Translate.TranslateOption.targetLanguage("t
r"),
translatedTv.setText(translatedText);
} public boolean
checkInternetConnection()
{
60
//Check internet connection:
getSystemService(Context.CONNECTIVITY_SERVICE);
connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_MOBILE).getState
() == NetworkInfo.State.CONNECTED ||
connectivityManager.getNetworkInfo(ConnectivityManager.TYPE_WIFI).getState()
== NetworkInfo.State.CONNECTED;
return connected;
61
8.3 DATA DICTIONARY
Global vocabulary :
Support your global user base with Speech-to-Text’s extensive language support in over 125
languages and variants.
Speech adaptation:
Customize speech recognition to transcribe domain-specific terms and rare words by providing
hints and boost your transcription accuracy of specific words or phrases. Automatically convert
spoken numbers into addresses, years, currencies, and more using classes. Speech-to-Text
On-Prem :
Have full control over your infrastructure and protected speech data while leveraging Google’s
speech recognition technology on-premises, right in your own private data centers. Contact sales
to get started.
Multichannel recognition:
Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference)
and annotate the transcripts to preserve the order.
Noise robustness:
Speech-to-Text can handle noisy audio from many environments without requiring additional
noise cancellation.
Domain-specific models:
Choose from a selection of trained models for voice control and phone call and video transcription
optimized for domain-specific quality requirements. For example, our enhanced phone call model
is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling
rate.
62
Content filtering:
Profanity filter helps you detect inappropriate or unprofessional content in your audio data and
filter out profane words in text results.
Transcription evaluation :
Upload your own voice data and have it transcribed with no code. Evaluate quality by iterating on
your configuration.
63
CHAPTER 9
64
BIBLIOGRAPHY AND REFERENCES:
8. https://github.com/topics/speech-to-text
9. http://www.ling.helsinki.fi/~gwilcock/Tartu-2003/L7-
Speech/JSAPI/Recognition.html#:~:text=A%20speech%20recognizer%20is%20a,of%20support
ing%20classes%20and%20interfaces.
65