lime icon

Phosphorus and Lime

A Developer's Broadsheet

This blog has been deprecated. Please visit my new blog at klenwell.com/press.
PHP: Fuzzy Search Functions
Moving into the hairy territory of fuzzy searching after the person I was supposed to be working with let me down. Three basic functions:

function get_leven_score($string, $compare_string)
{
// *** DATA

# internal
$_leven = 0;

# return
$score = 1000;


// *** MANIPULATE

# get levenshtein distance
$_leven = levenshtein($string, $compare_string);

# convert to percentage score relative to string
$score = ($_leven/strlen($string))*1000;


// *** RETURN

return $score;

} # end Fx


function get_leven_soundex($string, $compare_string)
{
// *** DATA

# internal
$_leven = 0;

# return
$score = 1000;


// *** MANIPULATE

# get soundex values
$_snd1 = soundex($string);
$_snd2 = soundex($compare_string);

# get levenshtein distance
$_leven = levenshtein($_snd1, $_snd2);

# convert to percentage score relative to string
$score = ($_leven/strlen($_snd1))*1000;


// *** RETURN

return $score;

} # end Fx


function get_leven_metaphone($string, $compare_string)
{
// *** DATA

# internal
$_leven = 0;

# return
$score = 1000;


// *** MANIPULATE

# get soundex values
$_snd1 = metaphone($string);
$_snd2 = metaphone($compare_string);

# get levenshtein distance
$_leven = levenshtein($_snd1, $_snd2);

# convert to percentage score relative to string
$score = ($_leven/strlen($_snd1))*1000;


// *** RETURN

return $score;

} # end Fx


The lower the score, the better the match, with 0 being exact. So you can set a limit for matching terms -- a common recommendation seems to be around 500 and lower.

Source: php.net

Keywords: PHP, SEO