The Holistic Look at Longest Common Subsequence Problem

Problem-solving plays a significant role in programming interviews. Numerous product-based companies prefer assessing their applicants' core problem-solving abilities. And optimization problems are the must-haves for these programming interviews as these problems consist of complex and vast solution spaces. The longest common subsequence problem is also one of the classic optimization problems. So, in this article, we will understand this LCS problem in detail along with different ways to formulate its solution.

Here's How to Land a Top Software Developer Job

Full Stack Developer - MERN StackExplore Program
Here's How to Land a Top Software Developer Job

Understanding Longest Common Subsequence Problem

A subsequence is nothing but a series of elements that occur in the same order but are not necessarily contiguous. Suppose, A and B are two given sequences consisting of a finite set of characters. Then, we can say that C is a common subsequence of both A and B, if and only if C can be derived from A and B. 

If two sequences are given in the LCS problem, then our task is to identify a common subsequence that has a maximum length. Let’s look at an example to understand this problem better. Consider two strings given below:

                                                      S1 = “ABCDE”

                                                       S2 = “CDE”

Now, let’s look at possible common subsequences for both S1 and S2.

          Common Subsequences: “C”, “D”, “E”, “CD”, “DE”, “CE”, “CDE”

Out of these common subsequences, subsequence CDE has a maximum length. Thus, it will be considered as the longest common subsequence for S1 and S2. Moving forward, we will look into a recursive solution for the longest common subsequence problem.

Recursive Solution for LCS Problem

Let’s say that we are given two sequences S1 and S2, having lengths m and n, respectively. And we want to find out the longest common subsequence using the naive recursive approach. In order to do that, the first step we can perform is to determine if each subsequence of S1 is also a subsequence of S2 or not. 

There is a possibility of having 2m subsequences for sequence S1. And it takes O(n) time to test if a subsequence of S1 is a subsequence of S2 or not. Thus, the overall time complexity for this approach will be O(2m*n). 

Learn the Ins & Outs of Software Development

Caltech Coding BootcampExplore Program
Learn the Ins & Outs of Software Development

Let’s solve one problem using this naive recursive approach.

Problem Statement: Create a program to find out the length of longest common subsequence for strings A = “XY”, B = “XPYQ”.

Solution: First, let’s understand the programming implementation of our LCS problem.

If you look into the code of LCS function, you will find three evaluation conditions as mentioned below:

  • Condition 1: Check if both character arrays are empty. If they are empty, return 0. 
  • Condition 2: If the character arrays are not empty, then check if the last character of both character arrays matches or not. If you get a match, then return 1 + LCS (Both character arrays with reduced size).
  • Condition 3: If you don’t get a character match, move either one character in the first character array A or the second character array B. Pick whichever produces maximum value. 

The program given below follows the strategy mentioned above to find out the length of longest common subsequence.

#include<stdio.h>
#include<string.h>
int LCS( char *A, char *B, int x, int y )
{
if (x == 0 || y == 0)
    return 0;
if (A[x-1] == B[y-1])
    return 1 + LCS(A, B, x-1, y-1);
else
    return max(LCS(A, B, x, y-1), LCS(A, B, x-1, y));
} 
int max(int m, int n)
{
    return (m > n)? m : n;
}
int main()
{
char A[] = "XY";
char B[] = "XPYQ";
int x = strlen(A);
int y = strlen(B);
printf("Length of LCS is %d", LCS( A, B, x, y ));
return 0;
}

Image Reference

Output: The console output for the program above is represented in image below.

Longest_Common_Subsequence_Using_Recursion.

Explanation: By following the conditions mentioned above, we can find out the length of longest common subsequence. The call stack for this recursive algorithm will be as represented in the image given below:

Call_Stack_LCS_Recursion.

Here's How to Land a Top Software Developer Job

Full Stack Developer - MERN StackExplore Program
Here's How to Land a Top Software Developer Job

The evaluation of the length of LCS (denoted as sum) is done as represented below:

LCS_Length_Evaluation

For this simple LCS problem, we get a vast solution space. Also, the time-complexity is going to increase at an exponential rate (O(2n)) as it is dependant on the depth of recursive tree 

(2(depth of recursive tree)). Due to huge time and space complexity, this naive recursive solution is not an ideal solution for this longest common subsequence problem. Hence, in the next section, we will look into a more reliable and optimal approach to solve this problem.

Dynamic Programming Implementation of LCS

The dynamic programming paradigm consists of two methods known as the top-down approach and the bottom-up approach. The top-down approach memorizes results in a recursive solution, whereas the bottom-up approach builds a solution from the base case. Hence, for this particular problem, we will formulate a bottom-up solution.

Problem Statement: Create a program to find out the length of longest common subsequence for strings A = “XY”, B = “XPYQ”.

Solution: Let’s look into the steps to implement a bottom-up solution for our LCS problem.

  • Step 1: Make a table with the dimensions (x+1*y+1), where x and y are the lengths of strings A and B, respectively. Insert a null column and row to represent the end of the string. Now, insert zeros in all first rows and columns.

Tabulation_Step1

  • Step 2: Use the logic given below to fill each cell in the table.
  • Step 3: Fill the current cell by adding one to the diagonal element if the character equivalent to the current row and column matches. Also, draw an arrow to the cell on the diagonal.

Step3_Tabulation.

  • Step 4: Otherwise, fill the current field with the largest value from the preceding column and row element. Draw an arrow to the cell with the highest value. If they're all the same, you can point at any of them.

Step4_Tabulation.

  • Step 5: Repeat steps from 2 to 4 until the whole table is filled.

Step5_Tabulation_Longest_Common_Subsequence.

  • Step 6: The length of the longest common subsequence is reflected by the value in the final row and column.

Step6_Tabulation_Longest_Common_Subsequence

The C program based on the strategy mentioned above is given below:

#include<string.h>
#include<stdio.h>
int LCS( char *A, char *B, int x, int y )
{
    int L[x+1][y+1];
    int i, j; 
    for(i =0; i<=x; i++){
        for(j=0; j<=y; j++){
            if( i==0|| j==0)
                 L[i][j] = 0;
            else if(A[i-1] == B[j-1]){
                L[i][j] = L[i-1][j-1] + 1;
            }
            else{
                L[i][j] = maximum(L[i-1][j], L[i][j-1]);
            }
        }
    }
    return L[x][y];
}
int maximum(int a, int b)
{
    return (a > b)? a : b;
} 
int main()
{
char A[] = "XY";
char B[] = "XPYQ";
int x = strlen(A);
int y = strlen(B);
printf("The Length of Longest Common Subsequence is %d", LCS( A, B, x, y ) );
return 0;
}

Image Reference

Output: The console output for the dynamic programming implementation of LCS is given below:

Longest_Common_Subsequence_DP_Output.

Explanation: The dynamic programming approach cuts down the number of function calls. It saves the outcome of each function call; so that it can be reused without the need for duplicate calls in the future. The results of each comparison between elements of A and B are maintained in tabular format to remove redundant computations.

Due to that, the time taken by a dynamic programming approach to solve the LCS problem is equivalent to the time taken to fill the table, that is, O(m*n). This complexity is relatively low in comparison to the recursive paradigm. Hence, dynamic programming is considered as an optimal strategy to solve this space optimization problem.

Advance your career as a MEAN stack developer with the Full Stack Web Developer - MEAN Stack Master's Program. Enroll now!

Conclusion

In this longest common subsequence problem article, you learned what the LCS problem is with the help of examples. You also discovered the recursive solution to the LCS problem, along with its complexity analysis. After that, you looked into a more optimal approach to implement the LCS solution, which is known as dynamic programming. Finally, you came across the strategy and program to build a bottom-up dynamic programming solution.  If you want a drive-through explanation to understand and implement the longest common subsequence problem, we advise you to watch this video: 

Every day, new products, tools, and apps are being introduced in the market. And there are hundreds of programming languages and frameworks being utilized every day in the realm of software development. Hence, it's crucial for you to go beyond data structure concepts and cover the foundations of interactive application development. Simplilearn's Software Development Courses provide the right platform for you to master the art of software development. The courses can help you improve your odds of becoming a successful software developer. So, go ahead and start exploring!

If you have any questions or need clarification on any topic from this article on the longest common subsequence, please leave them in the comments section below and we will respond to them soon!

About the Author

Omkar Prabhakar Ghadage.Omkar Prabhakar Ghadage.

Omkar holds a bachelor's degree in computer science with a machine learning minor. Artificial intelligence and automation are two topics that he's passionate about. Python, R, and C++ are among his programming languages of choice. He enjoys watching web series & indulging in adventurous activities

View More
  • Acknowledgement
  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, OPM3 and the PMI ATP seal are the registered marks of the Project Management Institute, Inc.