php

PHP Levenshtein() Function

The Levenshtein distance is calculated by counting the total number of characters that are required to modify a string by inserting, updating, or deleting one or more characters to get another string. PHP has a built-in function named levenshtein() to calculate the Levenshtein distance between two strings by comparing the strings. This function compares the string values in a case-sensitive way. PHP has another function named similar_text() to do the same task but the levenshtein() function returns more accurate results and works faster. The different uses of the levenshtein() function is shown in this tutorial.

Syntax:
int levenshtein(string $string1, string $string2, int $insertion_cost = 1, int $replacement_cost = 1,
int $deletion_cost = 1)

This function has five arguments. The first and second arguments are mandatory and the other five arguments are optional. The purposes of these five arguments are described in the following:

  • $string1: It contains the first string that is compared with the second argument.
  • $string2: It contains the second string that is compared with the first argument.
  • $insertion_cost: It contains the cost for insertion.
  • $replacement_cost: It contains the cost for replacement.
  • $deletion_cost: It contains the cost for deletion.

This function returns the Levenshtein distance between the first and second argument values of the function. If the total number of characters of the strings is more than 255, the function returns -1.

Different Examples of Levenshtein() Function

The different uses of the levenshtein() function are shown in this part of the tutorial using multiple examples.

Example 1: Compare Two Strings of a Single Word

Create a PHP file with the following script that calculates the Levenshtein distance between two single words using the levenshtein() function. The value of the levenshtein distance is printed later.

<?php

//Define the first string
$str1 = 'Fool';
//Define the second string
$str2 = 'Feel';
//Calculate the Levenshtein distance
echo "<h3> The levenshtein distance is ".levenshtein($str1, $str2)."</h3>";

?>

Output:

The following output appears after executing the previous script. The Levenshtein distance of two words, “Fool” and “Feel”, is 2 which is printed in the output:

Example 2: Compare Two Strings of Multiple Words

Create a PHP file with the following script that claculates the Levenshtein distance between two strings of multiple words using the levenshtein() function. The value of the Levenshtein distance is printed later. Here, the first string contains three words and the second string contains two words. One word is common in both strings. The Levenshtein distance of these two strings is printed later.

<?php

//Define the first string
$str1 = 'PHP Programming language';
//Define the second string
$str2 = 'Java Programming';
//Calculate the levenshtein distance
echo "<h3> The levenshtein distance is ".levenshtein($str1, $str2)."</h3>";

?>

Output:

The following output appears after executing the previous script. Here, the first string value is “PHP Programming language” and the second string is “Java Programming”. The word “Programming” is common in both strings. Four characters are required to modify to get “Java” from “PHP” and 9 characters (language) are required to remove from the first string to get the second string. So, the Levenshtein distance is 4+9 = 13.

Example 3: Search the Exact or Closest Match in the Array

Create a PHP file with the following script that searches a particular string in an array. Find out the value from the array that exactly or partially matches the search string value by calculating the Levenshtein distance between the search string and each element of the array. The search string value is taken from the URL parameter. If no URL parameter is given, the default string value is used for the search string. The initial value of $short_distance is set to -1 before iterating the array values to find the exact or closest match. The first foreach loop is used to print the existing values of the array. The second foreach loop is used to iterate the array values and calculate the Levenshtein distance between each array value and the search string in each iteration. If the Levenshtein distance becomes 0 in any iteration, the exact match of the search string exists in the array. Otherwise, the closest match of the search string is retrieved from the array values based on the lowest Levenshtein distance value.

<?php
//Set the search value
$search = isset($_GET['src'])? $_GET['src']:"Java";

//Set default distance value
$short_distance = -1;

//Declare an array
$languages = array('PHP','PERL','Python','Bash','Java','C++','C#','Java');
echo "Array values are:<br/>";
foreach($languages as $lang)
echo $lang."<br/>";

echo "Search word: <b>$search</b>";

//Search the exact or closest value in the array that matches with search value
foreach ($languages as $language) {

       //Calculate the Levenshtein distance
       $lev_distance = levenshtein($search, $language);

        //Checking the exact match
        if ($lev_distance == 0) {
              $short_distance = 0;
              echo "<br/>The exact match is found.";
              break;
         }

         //Search the closest match
         if ( $lev_distance <= $short_distance || $short_distance < 0) {
               //Reset the short distance
               $short_distance = $lev_distance;
                //Reset the closest value
               $close_value = $language;
         }
}
//Print the closest matched value
if ($short_distance > 0)
echo "<br/>The closest value of the search word is ".$close_value;

?>

Output:

The following output appears after executing the previous script if no URL parameter is given. In this case, the default search value is “Java” which exists in the array. So, the Levenshtein distance becomes 0 for this search value when calculating the Levenshtein distance with the fifth element of the array that is also “Java”:

The following output appears after executing the previous script for the search value which is “Python” that is given in the URL parameter. In this case, the search value “Python” exists in the array. So, the Levenshtein distance becomes 0 for this search value when calculating the Levenshtein distance with the third element of the array that is also “Python”:

The following output appears after executing the previous script for the search value “Python3” that isen given in the URL parameter. In this case, the search value is “Python3” which partially matches with one element of the array that is “Python”. So, the Levenshtein distance becomes 1 for this search value when calculating the Levenshtein distance with the third element of the array:

Conclusion

The different uses of the levenshtein() function that are shown in this tutorial will help the new Python users to know the purpose of using this function and be able to properly use it in their script.

About the author

Fahmida Yesmin

I am a trainer of web programming courses. I like to write article or tutorial on various IT topics. I have a YouTube channel where many types of tutorials based on Ubuntu, Windows, Word, Excel, WordPress, Magento, Laravel etc. are published: Tutorials4u Help.