Moving into the hairy territory of fuzzy searching after the person I was supposed to be working with let me down. Three basic functions:
function get_leven_score($string, $compare_string)
{
// *** DATA
# internal
$_leven = 0;
# return
$score = 1000;
// *** MANIPULATE
# get levenshtein distance
$_leven = levenshtein($string, $compare_string);
# convert to percentage score relative to string
$score = ($_leven/strlen($string))*1000;
// *** RETURN
return $score;
} # end Fx
function get_leven_soundex($string, $compare_string)
{
// *** DATA
# internal
$_leven = 0;
# return
$score = 1000;
// *** MANIPULATE
# get soundex values
$_snd1 = soundex($string);
$_snd2 = soundex($compare_string);
# get levenshtein distance
$_leven = levenshtein($_snd1, $_snd2);
# convert to percentage score relative to string
$score = ($_leven/strlen($_snd1))*1000;
// *** RETURN
return $score;
} # end Fx
function get_leven_metaphone($string, $compare_string)
{
// *** DATA
# internal
$_leven = 0;
# return
$score = 1000;
// *** MANIPULATE
# get soundex values
$_snd1 = metaphone($string);
$_snd2 = metaphone($compare_string);
# get levenshtein distance
$_leven = levenshtein($_snd1, $_snd2);
# convert to percentage score relative to string
$score = ($_leven/strlen($_snd1))*1000;
// *** RETURN
return $score;
} # end Fx
The lower the score, the better the match, with 0 being exact. So you can set a limit for matching terms -- a common recommendation seems to be around 500 and lower.
Source:
php.netKeywords: PHP, SEO