0% found this document useful (0 votes)
26 views

Lab 1 - Introduction To Python

This document outlines three tasks for a CS370 Artificial Intelligence lab assignment on calculating Levenshtein distance between strings using Python. The first task involves writing a program to calculate the edit distance between two input strings. The second task modifies this to take two text files as input and output the word-level distance between sentences. The third task further modifies this to ignore common words when calculating the distance. Students are to submit their Python programs to calculate Levenshtein distance between strings and text files in various ways.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Lab 1 - Introduction To Python

This document outlines three tasks for a CS370 Artificial Intelligence lab assignment on calculating Levenshtein distance between strings using Python. The first task involves writing a program to calculate the edit distance between two input strings. The second task modifies this to take two text files as input and output the word-level distance between sentences. The third task further modifies this to ignore common words when calculating the distance. Students are to submit their Python programs to calculate Levenshtein distance between strings and text files in various ways.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Department of Computing

CS370: Artificial Intelligence


Class: BSCS-11C
Lab 01: Introduction to Python

Date: 15-09-2023

Lab Engineer: Ms Shakeela

Mahad Mohtashim
379889
BSCS-11-C

Page 1
Task #1

Write down a python program which takes two strings as input and calculate the
Levenshtein/Edit distance between the two strings.

Explanation:-

Levenshtein/Edit distance gives us a measure of similarity between two strings/sequences.


Going by formal definition it is minimum number of single character edits required to transform
one string into another.

Single character edits include:-

 Insertion
 Deletion
 Substitution

Mathematically:-

Mathematically Levenshtein/Edit distance between two strings ‘a’ and ‘b’ is defined as:-

For further understanding of the formula you may read this blog as it explains it in great depth
or you may get back to me wherever/whenever you stuck.

https://medium.com/@ethannam/understanding-the-levenshtein-distance-equation-for-beginners-
c4285a5604f0

Page 2
But it does not explain how to count the edit operations while calculating overall Levenshtein
distance.

The output of your program should somewhat look like:-

Task #2

Now modify the above written program in such a way that it takes two text files containing
single- line and lowercase English sentences named as reference.txt and hypothesis.txt, and
outputs the file result.txt containing Levenshtein distance of these two files as below. The
distance should be word level and not character level.

Page 3
**********reference.txt***************

this is some text and we would like to see if it has been identified correctly by speech recognition system

***************************************

**********hypothesis.txt*************

this is a text and we would like to check what has been identified by the speech recognition

***************************************

*********result.txt*******************

Levenshtein distance is 7

Insertions 1

Deletions 3

Substitutions 3

***************************************

Hint:-

In this case we can treat words as characters in previous case, right?

Task #3

Now modify the above program so that it ignores 10 common words in such a way:-

 Insertions and deletions involving these common words are ignored


 Substitutions are ignored when both initial and final word are one of 10 common words

List of 10 common words:

the, of, and, a, be, this, there, an, been, some

Now the result2.txt should look like :-

Page 4
*********result2.txt*******************

Levenshtein distance is 5

Insertions 0

Deletions 3

Substitutions 2

***************************************

Submission Guidelines:-

Deliverables and Deadline:

Please add as per your convenience

Page 5

You might also like