lime icon

Phosphorus and Lime

A Developer's Broadsheet

This blog has been deprecated. Please visit my new blog at klenwell.com/press.
PHP: sound, text, and fuzzy searching
Trying to develop a good rhyming algorithm, which has led me to soundex, metaphone, etc.

Found the following comment on searching interesting:

The soundex 'different letter in front' problem can be solved by using levenshtein() on the soundex codes. in my application, which is searching a database of album names for entries that match a particular user provided string, i do the following:

1. Search the database for the exact name
2. Search the database for entries where the name occurs anyway as a string
3. Search the database for entries where any of the words in the name (if the user has typed in more than one word) is present, except for little words (and, the, of etc)
4. Then, if all this fails, I go to plan b:

- calculate the levenshtein distance (levenshtein()) between the user search term and each of the entries in the database as a percentage of the length of the user search term entered

- calculate the levenshtein distance between the metphone codes of the user search term entered and each field in the database as a percentage of the length of the metaphone code of the user search term entered

- calculate the levenshtein distance between the soundex codes of the user search term entered and each field in the database as a percentage of the length of the soundex code of the original user search term entered

if any of these percentages is less than 50 (means that two soundex codes with different first letters will be accepted!!) then the entry is accepted as a possible match.


source: soundex (php.net)

See also: fuzzy searching

keywords: PHP, soundex, phonetics, text, fuzzy, search