## Subsequence Frequency

The question is to find the number of times a string appears as a subsequence in an another string. A similar question was asked in Google Code Jam 09. Here is the link.

So how do we do it. If you think the general way the first brute force method which comes to mind is to get a matrix for the general subsequence problem and then backtrack all paths that lead to the subsequence. But the backtracking will be too time consuming. Now the second thought which comes to mind is can’t we have a similar dynamic programming approach to find the frequency rather than length. Yes, thankfully we can do that. How?

First lets start with a string of two letters. Suppose we need to find the occurence of ab in abbacb. We can calculate the answer to be four. How did we do it, there are three b’s after first ‘a’ and only one after the second ‘a’, all in all four. So if somehow we manipulate the table to deal with the frequency of characters we can take out the total numbers of times the string appears as subsequence. If you wan’t to try out the algorithm for yourself, now is the right time. What follows next is the algorithm.

We have the table for the above example as :

a | b | b | a | c | b | ||

0 | 1 | 1 | 1 | 1 | 1 | 1 | |

a | 0 | 1 | 1 | 1 | 2 | 2 | 2 |

b | 0 | 0 | 1 | 2 | 2 | 2 | 4 |

As seen above we set the row[0] values to 1 and column[0] values to 0. Here’s the algorithm.

Let’s call the string whose frequency is to be found as str1 and given string str2.

for i from 1 to length(str1) { for j from 1 to length(str2) { if (str1[i-1]==str2[j-1]) table[i][j]=table[i-1][j]+table[i][j-1]; else table[i][j]=table[i][j-1]; } }

Here’s a C++ code for the Code Jam Question :

#include using namespace std; char str[1000],cj[]="welcome to code jam"; int M[30][1000]={0}; int main() { int t,i,j,k,l,lcj=strlen(cj);; scanf("%d\n",&t); for (j=1;j<=1000;j++) M[0][j]=1; for (j=1;j<=lcj;j++) M[j][0]=0; for (k=1;k<=t;k++) { gets(str); l=strlen(str); for (i=1;i<=lcj;i++) { M[i][0]=0; for (j=1;j<=l;j++) if (cj[i-1]==str[j-1]) M[i][j]=(M[i-1][j]+M[i][j-1])%10000; else M[i][j]= M[i][j-1]; } printf("Case #%d: %04d\n",k,M[lcj][l]); } }

Happy Coding!

-fR0D

Why 10000 is used in :-

M[i][j]=(M[i-1][j]+M[i][j-1])%10000;

XT3RMOctober 2, 2009 at 8:40 PM

It was specified in the question. Only last 4 digit of the answer had to be given.

fR0DOctober 2, 2009 at 8:52 PM

Shouldn’t it be table[i-1][j-1]+table[i][j-1] instead of table[i-1][j]+table[i][j-1] ?

Take e.g. str1=str2=”bb”, answer should be 2, but your algorithm would give 3.

VLMarch 7, 2011 at 1:22 AM

Sorry, I meant the answer should be 1, but your algorithm gives 2.

VLMarch 7, 2011 at 1:25 AM

Ho yes!!!, the recurrence it´s table[i-1][j-1]+table[i][j-1] not table[i-1][j]+table[i][j-1]

DarkAvengerSeptember 8, 2011 at 1:55 AM

I would like to know whether it is possible to “write a program or algorithm” to find the time and space complexity of any given program taken as input.

Input : any program (P) [in any language or of a particular language]

Output : time complexity of that program (P).

Have there been any prior attempts to write such a program? Are there any algorithms available for this purpose?

If so please provide the necessary links, references or with any kind of guidance possible.

zinatMarch 13, 2014 at 11:41 AM