COME ON CODE ON

A blog about programming and more programming.

Posts Tagged ‘matching

Knuth–Morris–Pratt Algorithm (KMP)

with one comment

Knuth–Morris–Pratt algorithm is the most popular linear time algorithm for string matching. It is little difficult to understand and debug in real time contests. So most programmer’s have a precoded KMP in their kitty.

To understand the algorithm, you can either read it from Introduction to Algorithms (CLRS) or from the wikipedia page. Here’s a sample C++ code.

void preKMP(string pattern, int f[])
{
        int m = pattern.length(),k;
        f[0] = -1;
        for (int i = 1; i<m; i++)
        {
                k = f[i-1];
                while (k>=0)
                {
                        if (pattern[k]==pattern[i-1])
                                break;
                        else
                                k = f[k];
                }
                f[i] = k + 1;
        }
}
 
bool KMP(string pattern, string target)
{ 
        int m = pattern.length();
        int n = target.length();
        int f[m];
        
        preKMP(pattern, f);
        
        int i = 0;
        int k = 0;
        
        while (i<n)
        {
                if (k==-1)
                {
                        i++;
                        k = 0;
                }
                else if (target[i]==pattern[k])
                {
                        i++;
                        k++;
                        if (k==m)
                                return 1;
                }
                else
                        k=f[k];
        }
        return 0;
}

NJOY!
-fR0DDY

Written by fR0DDY

August 29, 2010 at 12:20 PM

The Z Algorithm

with one comment

In this post we will discuss an algorithm for linear time string matching. It is easy to understand and code and is usefull in contests where you cannot copy paste code.

Let our string be denoted by S.
z[i] denotes the length of the longest substring of S that starts at i and is a prefix of S.
α denotes the substring.
r denotes the index of the last character of α and l denotes the left end of α.

To find whether a pattern(P) of length n is present in a target string(T) of length m, we will create a new string S = P$T where $ is a character present neither in P nor in T. The space taken is n+m+1 or O(m). We will compute z[i] for all i such that 0 < i < n+m+1. If z[i] is equal to n then we have found a occurrence of P at position i – n – 1. So we can all the occurrence of P in T in O(m) time. To calculate z[i] we will use the z algorithm.

The Z Algorithm can be read from the section 1.3-1.5 of book Algorithms on strings, trees, and sequences by Gusfield. Here is a sample C++ code.

bool zAlgorithm(string pattern, string target)
{
	string s = pattern + '$' + target ;
	int n = s.length();
	vector<int> z(n,0);
	
	int goal = pattern.length();
	int r = 0, l = 0, i;
	for (int k = 1; k<n; k++)
	{
		if (k>r)
		{
			for (i = k; i<n && s[i]==s[i-k]; i++);
			if (i>k)
			{
				z[k] = i - k;
				l = k;
				r = i - 1;
			}
		}
		else
		{
			int kt = k - l, b = r - k + 1;
			if (z[kt]>b)
			{
				for (i = r + 1; i<n && s[i]==s[i-k]; i++);
				z[k] = i - k;
				l = k;
				r = i - 1;
			}
		}
		if (z[k]==goal)
			return true;
	}
	return false;
}

NJOY!
-fR0D

Written by fR0DDY

August 29, 2010 at 12:03 AM

%d bloggers like this: