COME ON CODE ON

A blog about programming and more programming.

Archive for the ‘Algorithm’ Category

Subsequence Frequency

with 6 comments

The question is to find the number of times a string appears as a subsequence in an another string. A similar question was asked in Google Code Jam 09. Here is the link.

So how do we do it. If you think the general way the first brute force method which comes to mind is to get a matrix for the general subsequence problem and then backtrack all paths that lead to the subsequence. But the backtracking will be too time consuming. Now the second thought which comes to mind is can’t we have a similar dynamic programming approach to find the frequency rather than length. Yes, thankfully we can do that. How?

First lets start with a string of two letters. Suppose we need to find the occurence of ab in abbacb. We can calculate the answer to be four. How did we do it, there are three b’s after first ‘a’ and only one after the second ‘a’, all in all four. So if somehow we manipulate the table to deal with the frequency of characters we can take out the total numbers of times the string appears as subsequence. If you wan’t to try out the algorithm for yourself, now is the right time. What follows next is the algorithm.
We have the table for the above example as :

    a b b a c b
  0 1 1 1 1 1 1
a 0 1 1 1 2 2 2
b 0 0 1 2 2 2 4

As seen above we set the row[0] values to 1 and column[0] values to 0. Here’s the algorithm.
Let’s call the string whose frequency is to be found as str1 and given string str2.

for i from 1 to length(str1)
{
    for j from 1 to length(str2)
    {
        if (str1[i-1]==str2[j-1])
           table[i][j]=table[i-1][j]+table[i][j-1];
        else
            table[i][j]=table[i][j-1];
    }
}

Here’s a C++ code for the Code Jam Question :

#include
using namespace std;

char str[1000],cj[]="welcome to code jam";
int M[30][1000]={0};

int main()
{
    int t,i,j,k,l,lcj=strlen(cj);;
    scanf("%d\n",&t);
    for (j=1;j<=1000;j++)
        M[0][j]=1;
    for (j=1;j<=lcj;j++)
        M[j][0]=0;
    for (k=1;k<=t;k++)
    {
        gets(str);
        l=strlen(str);
        for (i=1;i<=lcj;i++)
        {
            M[i][0]=0;
            for (j=1;j<=l;j++)
                if (cj[i-1]==str[j-1])
                M[i][j]=(M[i-1][j]+M[i][j-1])%10000;
                else M[i][j]= M[i][j-1];
        }
        printf("Case #%d: %04d\n",k,M[lcj][l]);
    }
}

Happy Coding!
-fR0D

Written by fR0DDY

September 6, 2009 at 3:12 PM

Range Minimum Query

with 13 comments

RANGE QUERY

Given a (big) array r[0..n-1], and a lot of queries of certain type. We may want to pre-process the data so that each query can be performed fast. In this section, we use T (f, g) to denote the running time for an algorithm is O(f(n)) for pre-processing, and O(g(n)) for each query.

If the queries are of type getsum(a, b), which asks the sum of all the elements between a and b, inclusive, we have a T (n, 1) algorithm: Compute s[i] to be the sum from r[0] to r[i-1], inclusive, then getsum(a,b) simply returns s[b+1]-s[a].

RANGE MINIMUM QUERY

For the queries of the form getmin(a,b) asks the minimum elements between r[a] and r[b], inclusive, the task is little more hard. The idea is always to get the min in some big ranges, so in the queries we may try to use these big ranges to compute fast. One simple algorithm is T (n,sqrt(n)): Break the n numbers into sqrt(n) regions, each of size sqrt(n). Compute the champion for each region. For each query getmin(a,b), we go from a towards right to the nearest station (at most sqrt(n) steps), then go by at most sqrt(n) stations (big regions) to the nearest station before b, and from there go to b.

Sparse Table (ST) Algorithm

A better approach is to preprocess RMQ for sub arrays of length 2k using dynamic programming. We will keep an array preProcess[0, N-1][0, logN] where preProcess[i][j] is the index of the minimum value in the sub array starting at i having length 2j. For example :

A[0] A[1] A[2] A[3] A[4] A[5]
2 4 3 1 6 7

For the above array the preProcess[1][0] = 1, preProcess[1][1]=2,preProcess[1][2]=3 and so on. Specifically, we find the minimum in a block of size 2j by comparing the two minima of its two constituent blocks of size 2j-1. More formally, preProcess[i, j] = preProcess[i, j -1] if A[preProcess[i, j -1]] <= A[preProcess[i+2j-1, j -1]] and preProcess[i, j] = preProcess[i+2j-1, j -1] otherwise.

Once we have these values preprocessed, let’s show how we can use them to calculate RMQ(i, j). The idea is to select two blocks that entirely cover the interval [i..j] and  find the minimum between them. We select two overlapping blocks that entirely cover the subrange: let 2k be the size of the largest block that fits into the range from i to j, that is let k = log(j – i). Then rmq(i, j) can be computed by comparing the minima of the following two blocks: i to i + 2k – 1 (preProcess(i, k)) and j – 2k + 1 to j (preProcess(j –2k +1, k)). These values have already been computed, so we can find the RMQ in constant time.

rmq

So as shown above if we need to calculate minimum between A[1] and A[5] we will take two sections of size 4 and compare their minimum values to get the answer. Below is a C++ template class implementation :

int preProcess[1000][10];
template
class RMQMin
{
  T *A;
  public:
    RMQMin(int N,T *array):A(array)
    {
        int i,j;
        for (i=0;i<N;i++)
            preProcess[i][0]=i;
        for (j=1; (1<<j)<=N; j++)
            for (i=0; i+(1<<j)-1<N; i++)
                preProcess[i][j]=
                A[preProcess[i][j-1]]<=
                A[preProcess[i+(1<<(j-1))][j-1]]?
                preProcess[i][j-1]
                :preProcess[i+(1<<(j-1))][j-1];
    }
   
    int query(int start,int end)
    {
        int diff=end-start;
        diff=31 - __builtin_clz(diff+1);
        return A[preProcess[start][diff]]
          <=A[preProcess[end-(1<<diff)+1][diff]]?
          preProcess[start][diff]
          :preProcess[end-(1<<diff)+1][diff];
    }
};

You can simply use the above class for any type of variable(numeric ofcourse). Just keep in mind that you have to declare a preProcess array of size N x log N . So it turns out that the space complexity of the algorithm is O(N log N) and time complexity T(NlogN, 1).

-fR0D

Written by fR0DDY

April 18, 2009 at 6:54 AM

Posted in Algorithm, Programming

Tagged with , , , , , , , ,

Maximum Subarray in 1-D and 2-D Array

with one comment

This is a well known problem wherein we have to find the subarray whose sum is maximum. For 1-D array the fastest time that this can be done is in O(n). The algorithm was given by by Jay Kadane of Carnegie-Mellon University. It can be found here.
Its C++ implementation would look like this.

int MaxSum1D(vector M)
{
    int N=M.size(),i,t=0,S=1<<31;
    for (i=0;i<N;i++)
    {
        t=t+M[i];
        S=max(t,S);
        if (t<0)
           t=0;
    }
    return S;
}

For finding the maximum subarray in a 2-D array the brute force method would take O(n6) time. The trivial solution to it would take O(n4) time which should be good enough for most questions on online judges. Its C++ implementation would be:

int main()
{
    int N,maxsum=1<<31,i,j,k,l,m,sum,p;
    scanf("%d",&N);
    vector< vector > M(N+1,vector(N+1,0));
    vector< vector > S(N+1,vector(N+1,0));
    for (i=1;i<N+1;i++)
        for (j=1;j<N+1;j++)
            scanf("%d",&M[i][j]);
            
    for (i=1;i<N+1;i++)
        for (j=1;j<N+1;j++)
            S[i][j]=M[i][j]+
            S[i-1][j]+S[i][j-1]-S[i-1][j-1];
            
    maxsum=-128;
    for (i=1;i<N+1;i++)
        for (j=1;j<N+1;j++)
            for (k=i;k<N+1;k++)
                for (l=j;lmaxsum)
                       maxsum=sum;
                }
    printf("%d\n",maxsum);
}

But kadane gave a O(n3) solution for it too. In this algorithm we calculate the prefix sum for all possible row combination in O(n2) and then take out their maximum contigous sum in O(n) time. Thus doing the task in O(n3) time. C++ implementation would be :

int MaxSum2D(vector< vector > M)
{
    int S=1<<31,k,j,i,t,s;
    int N=M.size();
    
    for (i=0;i<N;i++)
    {
        vector pr(N,0);
        for (j=i;j<N;j++)
        {
            t=0;s=1<<31;
            for (k=0;k<N;k++)
            {
                pr[k]=pr[k]+M[j][k];
                t=t+pr[k];
                s=max(t,s);
                if (t<0)
                   t=0;
            }
            S=max(S,s);
        }
    }
    return S;
}

For finding the minimum and the position of the subarrays slight changes need to be made to the codes.

-fR0D

Written by fR0DDY

April 7, 2009 at 1:52 PM