Python

Longest Common Substring Python

The problem is to find the longest common substring in a given string. The task is to take two strings and find the longest common substring with or without repeating characters. In other words, match the longest common substring given in the same order and present in both the strings. For instance, ‘Tech’ is a sequence of characters given in ‘NextTech’, which is also the substring.

The process to find the longest common subsequence:

The simple process to find the longest common subsequence is to check each character of string 1 and find the same sequence in string 2 by checking each character of string 2 one by one to see whether any substring is common in both strings. For example, let’s say we have a string 1 ‘st1’ and string 2 ‘st2’ with lengths a and b, respectively. Check all the substrings of ‘st1’ and start iterating through ‘st2’ to check if any substring of ‘st1’ exists as ‘st2’. Start with matching the substring of length 2 and increasing the length by 1 in each iteration, rising to the maximum length of the strings.

Example 1:

This example is about finding the longest common substring with repeating characters. Python provides simple built-in methods to perform any functions. In the below example, we have provided the simplest way to find the longest common subsequence in 2 strings. Combining the ‘for’ and ‘while’ loops is utilized to get the longest common substring in a string. Have a look at the example given below:

def LongComSubS(st1, st2):
  ans = 0;
  for a in range(len(st1)):
         for b in range(len(st2)):
            k = 0;
            while ((a + k) < len(st1) and (b + k) < len(st2)
        and st1[a + k] == st2[b + k]):
                k = k + 1;

            ans = max(ans, k);
  return ans;

if __name__ == '__main__':
 
    A = 'ABBAAB'
    B = 'BABAAB'
 
    i = len(A)
    j = len(B)
 
    print('The longest common substring in a string is', LongComSubS(A, B))

The following output will be produced after executing the above code. It will find the longest common substring and give you as output.

Example 2:

Another way to find the longest common substring is to follow the iterative approach. A ‘for’ loop is used for iteration, and an ‘if’ condition matches the common substring.

def LongComSubS(A, B, m, n):
 
    maxLen = 0
    endIndex = m
   
    FIND = [[0 for x in range(n + 1)] for y in range(m + 1)]
 
    for i in range(1, m + 1):
        for j in range(1, n + 1):
 
            if A[i - 1] == B[j - 1]:
                FIND[i][j] = FIND[i - 1][j - 1] + 1
 
                if FIND[i][j] > maxLen:
                    maxLen = FIND[i][j]
                    endIndex = i
 
    return X[endIndex - maxLen: endIndex]
 
 
if __name__ == '__main__':
 
    A = 'ABBAAB'
    B = 'BABAAB'
 
    i = len(A)
    j = len(B)
 
    print('The longest common substring in a string is', LongComSubS(A, B, i, j))

Execute the above code in any python interpreter to get the desired output. However, we have used the Spyder tool to execute the program to find the longest common substring in a string. Here is the output of the above code:

Example 3:

Here is another example to help you find the longest common substring in a string using python coding. This method is the smallest, simplest, and easiest way to find the longest common subsequence. Have a look at the example code given below:

def common(st1, st2):

    def _iter():
        for a, b in zip(st1, st2):
            if a == b:
                yield a
            else:
                return

    return ''.join(_iter())

if __name__ == '__main__':
 
    A = 'ABBAAB'
    B = 'BABAAB'
 
    print('The longest common substring in a string is', LongComSubS(A, B))

Below you can find the output of the code given above

Using this method, we have not returned the common substring but the length of that common substring. To help you get the desired result, we have shown both outputs and methods for getting those results.

The time complexity and space complexity for finding the longest common substring

There is some cost to pay to perform or execute any function; time complexity is one of those costs. The time complexity of any function is calculated by analyzing how much time a statement can take to execute. Hence, to find all the substrings in ‘st1’, we need O(a^2), where ‘a’ is the length of ‘st1’ and ‘O’ is the symbol of time complexity. However, the time complexity of the iteration and finding whether the substring exists in ‘st2’ or not is O(m), where ‘m’ is the length of ‘st2’. So the total time complexity of discovering the longest common substring in two strings is O(a^2*m). Moreover, the space complexity is another cost of executing a program. The space complexity represents the space a program or a function will keep in the memory during execution. Hence, the space complexity of finding the longest common subsequence is O(1), as it does not require any space to execute.

Conclusion:

In this article, we have learned about the methods of finding the longest common substring in a string using python programming. We have provided three simple and easy examples to get the longest common substring in python. The first example uses the combination of ‘for’ and ‘while loop. While in the second example, we have followed the iterative approach by using the ‘for’ loop and ‘if’ logic. On the contrary, in the third example, we simply used the python built-in function to get the length of the common substring in a string. In contrast, the time complexity of finding the longest common substring in a string using python is O(a^2*m), where a and ma are the length of the two strings; string 1 and string 2, respectively.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content