
Recently, while working with one of our clients, we were asked to improve their site’s search so that it could handle misspelled or “sound-alike” words. In other words, even if someone typed a product name incorrectly, the search should still show the right results.
For example, if a user searched for sampoo or sampo, the system should still return shampoo.
Normally, the go-to solution for such a feature would be Apache Solr, a powerful search platform. But in this case, we had only about 20 products on the site, and given the limited time and budget, Solr felt like overkill. Instead, we looked for a lightweight, custom approach that could work just as well for a small catalog.
The Challenge: Different Languages and Special Characters
One hurdle we faced was that many products had names available in multiple languages. When written with English alphabets, these often included accented or special characters. For example: shampó or shampū.
The problem? Traditional search matching doesn’t recognize these as the same as plain shampoo.
To fix this, we first cleaned up the words—both the user’s search term and the product titles—by removing accents and converting everything into a simple, standard form. That way, shampó, shampū, and shampoo were all treated as the same word.
Making Words Match by Sound
Once we had “cleaned” words, the next step was making sure the search could match words that sound similar but are spelled differently. This is where we used an algorithm called Soundex.
Soundex works by turning a word into a code based on how it sounds in English. Words that sound alike end up with the same code. For example:
- Shampoo →
S510 - Sampo →
S510 - Sampoo →
S510
Since all these versions produce the same code, the system knows they should be treated as the same word.
The Outcome
By combining two simple steps—
- Normalizing characters (removing accents and special letters), and
- Using Soundex (matching words by sound),
—we were able to build a reliable misspelling-friendly search feature.
The best part? It worked for both English and translated product names, without the need for heavy tools like Solr. This lightweight approach was a perfect fit for a small product catalog and delivered exactly what the client wanted: a smarter, more forgiving search experience.
Sample code that we used to Normalize and match
// Function to call for removing accent.
$remove_accents = function ($string) {
return transliterator_transliterate('Any-Latin; Latin-ASCII; [:^ASCII:] Remove;', $string);
};
// Logic to compare each word of Input and Result Item.
foreach ($input_words as $input_word) {
foreach ($title_words as $title_word) {
// Compare words using Soundex
if (soundex($input_word) === soundex($title_word)) {
$nids[] = $value['nid'];
}
}
}
To make the search results usable, we first retrieved the product title along with its corresponding Node ID (NID) from the database. After running our comparison logic with the user’s search term, we identified which product titles matched phonetically. Each successful match was stored in an array containing the relevant NIDs. Finally, this array of matched results was passed to the frontend through a custom API, ensuring that the search suggestions displayed to the user were accurate and dynamically generated.