Python

Calculation of Hamming Distance in Python

You will learn how to determine the Hamming distance in Python in this lesson. The Hamming distance is the addition of all matching elements that vary between vectors in machine learning. You will know what hamming distance is and how to employ it by the end of this lesson, as well as how to calculate it using scipy, how to compute Hamming distance between binary plus numerical arrays, and how to calculate Hamming distance amongst string arrays. But first, let’s define what hamming distance is.

What is Hamming Distance?

The Hamming distance is a statistic that can be used to compare two binary data strings When two binary strings of equal length are compared, the computed Hamming distance is the number of bit places in which they differ. Data can be utilized for error detection as well as repair when it is sent across computer networks. It is also used in coding theory to compare data words of comparable length.

When comparing various texts or binary vectors, the Hamming distance is frequently utilized in machine learning. The Hamming Distance, for example, can be used to compare and determine how different the strings are. The Hamming distance is also frequently employed with one-hot encoded data. Binary strings are frequently used to represent one-shot encoded data (or bit strings). One-hot encoded vectors are perfect for determining differences between two points using the Hamming distance since they are always of equal length.

Example 1:

We will use scipy to compute the Hamming distance in Python throughout this example. To find the Hamming distance between two vectors, use the hamming() function in the Python scipy library. This function is included in the spatial.distance package, which also includes other helpful length calculating functions.

To determine the Hamming distance between two lists of values, first look at them. Import the scipy package into the code to calculate the Hamming distance. scipy.spatial.distance. hamming() takes the val_one and val_two arrays as input parameters and returns the hamming distance %, which is then multiplied by the array length to get the actual distance.

from scipy.spatial.distance import hamming

val_one = [20, 40, 50, 50]
val_two = [20, 40, 50, 60]

dis = hamming(val_one, val_two)
print(dis)

As you can see in the screenshot below, the function returned a result of 0.25 in this situation.

But how do we interpret this figure? The fraction of values that are different is returned by the value. To find the number of unique entries in the array, multiply this value by the list length:

from scipy.spatial.distance import hamming

val_one = [20, 40, 50, 50]
val_two = [20, 40, 50, 60]

dis = hamming(val_one, val_two) * len(val_one)
print(dis)

Here is the result when we multiply the resultant value with the length of the list.

Example 2:

Now, we will understand how to calculate the Hamming distance between the two integer vectors. Assume we have two vectors ‘x’ and ‘y’ with the values [3,2,5,4,8] and [3,1,4,4,4], respectively. The Hamming distance can easily be calculated using the Python code below. Import the scipy package to compute the Hamming distance in the supplied code. The hamming() function takes the ‘x’ and ‘y’ arrays as input parameters and returns the hamming distance %, which is multiplied by the array length to get the actual distance.

from scipy.spatial.distance import hamming

x = [4,3,4,3,7]
y = [2,2,3,3,3]

dis = hamming(x,y) * len(x)
print(dis)

The following is the output of the hamming distance python code shown above.

Example 3:

In this section of the article, you will learn how to calculate the Hamming distance between let’s say two binary arrays. The Hamming distance between the two binary arrays is determined in the same way we have done with the calculation of the Hamming distance of two numerical arrays. It’s worth noting that the Hamming distance only considers how far items are separated, not how far away they are. Explore the following example of computing the Hamming distance between two binary arrays in Python. The val_one array contains [0,0,1,1,0] and val_two array contains [1,0,1,1,1] values.

from scipy.spatial.distance import hamming

val_one = [0, 0, 1, 1, 0]
val_two = [1, 0, 1, 1, 1]

dis = hamming(val_one, val_two) * len(val_one)
print(dis)

The Hamming distance is 2 in this situation since the first and last items differ, as shown in the result below.

Example 4:

Calculating the difference between strings is a popular application of the Hamming distance. Because the method expects array-like structures, any strings we want to compare must first be transformed to arrays. The list() method, which turns a string into a list of values, can be used to accomplish this. To show how different two strings are, let’s compare them. You can see that we have got two strings in the code below: ‘catalogue’ and ‘America.’ Following that, both strings are then compared, and the result is displayed.

from scipy.spatial.distance import hamming

first_str = 'catalog'
second_str = 'America'

dis = hamming(list(first_str), list(second_str )) * len(first_str)
print(dis)

The outcome of the above Python code is 7.0, which you can see here.

You should always remember that the arrays must be of the same length. Python will throw a ValueError if we try to compare the strings of unequal lengths. Because the arrays provided can only be matched if they are of the same length. Take a glance at the code below.

from scipy.spatial.distance import hamming

first_str = 'catalog'
second_str = 'distance'

dis = hamming(list(first_str), list(second_str )) * len(first_str)
print(dis)

Here, the code throws ValueError because the two strings in the given code differ in length.

Conclusion

You learned how to compute the Hamming distance in Python in this tutorial. When two strings or arrays are compared, the Hamming distance is used to determine how many elements differ pairwise. The Hamming distance is frequently used in machine learning to compare strings and one-hot encoded arrays, as you know. Finally, you learned how to make use of the scipy library in order to calculate the Hamming distance.

About the author

Kalsoom Bibi

Hello, I am a freelance writer and usually write for Linux and other technology related content