Prerequisites:

It is highly recommended that you read my previous posts on dynamic programming if you are just starting with this concept.

This post is includes 2 dimensional dynamic programming which can be very confusing if you are just starting with the concept.

Therefore, if you do not know much of this concept, read the following posts:

The Problem:

We are given two strings, str1 and str2. We can do the following operations:

  1. Insert: Insert a letter at a particular index.
  2. Remove: Remove a letter from a particular index.
  3. Replace: Replace a letter at a particular index with another letter.

We have to find the minimum number of edits required to convert str1 into str2.

The solution:

To solve this let us think of an example:

Let’s say str1 is “Sunday” and str2 is “Monday”. Using intuition, the most obvious answer is 2. Because all you need to do to convert str1 to str2 is:

  1. Replace “s” with “m”. Thus, str1 becomes “munday”
  2. Replace “u” with “o”. Thus, str1 becomes “monday” and we have successfully converted str1 into str2.

This is very easy to solve using intuition, however, making an algorithm for this can get complicated.

To think of a solution to this problem using dynamic programming, let’s consider our very trusty data-structure, the array. In this case it will be 2 dimensional. Let the number of columns be the length of str1 (“m”) + 1 and the number of rows be the length of str2(“n”) + 1. You will understand the reason for the “+1” in a while. Also, let’s write the 2 words on top of the arrays for better visualization. For now, fill the array with zeroes. It will look like:

Let each element of the array represent the minimum number of moves needed to convert the string till that element on top, to the string till that element on the left.

For example, dp[1][2] will represent the number of moves needed to convert “SU” into “M”, which is obviously 2.

Therefore, if we fill the array with their respective values, the last element, i.e., dp[m][n] will give you the minimum number of moves needed to convert str1 into str2.

Think about this for a while. How would we fill this array using a clear-cut algorithm?

Have you thought enough?

Well, the answer is quite logical and beautiful.

Filling the first row and first column is quite easy. As all you need to do in this is find how many moves it takes to convert the given string into 0, which in this case basically means a null string. So, the array would look like:

Now think about this:

If the last letters of the given strings are the same, then it can be assumed that the minimum number of moves needed to convert the first string into the second string is the number of moves needed to convert the first string without the last letter into the second string without the last letter. Then the value of that element will be equal to the value of the element immediately top-left to it. So, this means dp[i][j] = dp[i-1][j-1].

However, if the last letters if the given strings are not the same, we cannot assume the above statement. We will have to perform the given operations on the last letter as well.

We can either insert, delete or replace. Let’s analyse what will happen when we perform these operations:

  1. Replace:
    Assuming we replace with an appropriate letter, the situation will represent, the previous situation where the last letter is the same (m-1 and n-1). Thus, the number of steps needed to convert the first string into the second string, after having performed the replace function on the last letter of the first string will be dp[i-1][j-1] + 1.
  2. Insert:
    If we insert with an appropriate letter, we are increasing the length of the first string and making sure that the last letters are the same. Hence, the number of moves will basically be the number of moves needed to convert the first string without the new letter into the second string without the last letter (m and n-1), plus 1 for the insert move. So, that means the value of that element will be dp[i-1][j] + 1, which is the value of the element to the top of this element +1.
  3. Delete:
    If we delete an appropriate letter, we are reducing the length of the first string by one and making sure that the last letters are the same. Hence, the number of moves will be the number of moves required to convert the first string without the last letter into the second string (m-1 , n), plus 1 for the delete move. So, that means the value of that element will be dp[i][j-1] + 1, which is the value of the element that is to the left of the given element.

Now the following question arises:

This is all well and good. We have understood what to do for each situation. However, choosing which operation is most efficient and should be run, is another mystery.

To answer this, let us look at what happens when we are choosing which operation to perform. For example, suppose we want to find the value of dp[1][1]. It has been highlighted (purple colour) in the following table:

Now,

  1. If we replace, the value will be dp[i-1][j-1] + 1 which in this case is dp[0][0] + 1, which equals 0+1 = 1.
  2. If we insert, the value will be dp[i-1][j] + 1, which in this case is dp[0][1] + 1, which equals 1+1 = 2.
  3. If we delete, the value will be dp[i][j-1] + 1, which in this case is dp[1][0] + 1, which equals 1+1 = 2.

It is observed that the best case for this element is to replace, as this gives the minimum moves.

Therefore, the value of an element can be written as:

1 + min( dp[i-1][j-1] , dp[i-1][j] , dp[i][j-1])

However, if the last letters are equal, we do not need to do this, we can just fill the element as equal to dp[i-1][j-1]

Also, remember we need to fill the first row and column using the method described at the beginning of the solution.

After filling up the full array, it looks like this:

Confirm if this is true with your understanding of the algorithm.

Therefore, the answer is dp[n][m]. This is 2, which is the answer we got when we tried to solve with logic.

The Program:

Now that we have discussed the solution, lets take a look at the program (written in pyhton):

s1 = input()
s2 = input()
m = len(s1)
n = len(s2)
dp = [[-1 for x in range(m+1)] for x in range(n+1)] 
for i in range(n+1):
    for j in range(m+1):
        if i == 0:
            dp[i][j] = j
        elif j == 0:
            dp[i][j] = i
        elif s2[i-1] == s1[j-1]:
            dp[i][j] = dp[i-1][j-1]
        else:
            dp[i][j] = min(dp[i-1][j-1]+1 , dp[i][j-1]+1 , dp[i-1][j]+1)
print("Answer is:",dp[n][m])

Output of the above program:

Github link of the above program:

https://github.com/HSNA243/PythonPrograms/blob/master/StringEdit.py

Picture from:

Business vector created by freepik – www.freepik.com

Visits: 301

One Reply to “Edit Distance – Dynamic Programming 3”

Leave a Reply

Your email address will not be published. Required fields are marked *