PostgreSQL

PostgreSQL Fuzzy Search

A fuzzy word leads to partial or unclear. This search deals with the result that does not see the accurate or perfect search. The fuzzy search of data in PostgreSQL contains different approaches. Either through extensions or algorithms. But the phenomenon is applied in both ways. This article will contain all possible methods to demonstrate the working and implementation of fuzzy search in PostgreSQL data.

Implementation of the Fuzzy Search

To understand the concept of fuzzy search, we need to create a table. Here, a table named painter is created with the columns of id, name, nationality, gender, and birth_year. Each column contains its data types respectively:

>> CREATE TABLE painter (id INT, name VARCHAR, nationality VARCHAR, gender VARCHAR, birth_year INT);

After the table is created, it is time to pass data in the rows of the relation to make it in a functional state. An insert statement is used to add the values in the table:

>> INSERT INTO painter (id, name, nationality, gender, birth_year) VALUES (1, 'john', 'American', 'male' , '2002'), ( 2, 'Robert', 'British', 'male' , '1990'), (3, 'Rosee', 'American', 'female' , '2000'), ( 4, 'Shomia' , 'German', 'female , '1982');

All the data will be entered as shown through the message that 4 rows are entered in the table.

The entered data can be viewed through the select statement used to fetch all the rows and columns of a relevant table:

>> SELECT * FROM painter;

All the data has been displayed. As you know, the fuzzy search is the one that causes the partial match and brings the result that is the closest one. So we will try each example of fuzzy search.

Example 1: Searching Through Wildcard Filter

In this example, we will fetch the record from the table by applying a condition on the name column. Any column can be affected by the fuzzy search. This situation can occur when you know the starting of the name, but you are not sure about the ending, so we use this operator ‘%’ at the end of the name. This is called a wildcard filter. This symbol can be for any character. So we will filter the results in which the name should start with ‘Rob’. The query contains the select statement to fetch the matching name row. Primarily while applying any condition after the where clause, we use an equal symbol ‘=’. But, as we are not sure about the results, we will use ‘like’ instead of that ‘=’ symbol:

>> SELECT * FROM painter WHERE name LIKE 'Rob%';

The resultant value shows that a single row is obtained. That matches with the name. Similarly, we will use the ‘%’ operator at both ends. So that filter will search for the painter’s name, whose name can begin or end with any combination of the characters:

>> SELECT * FROM painter WHERE name LIKE '%h%';

This command has two rows having the alphabet ‘h’ in the center. In this way, you can see that the fuzzy string matching works in PostgreSQL.

Example 2:

Using the same concept, if the word we used as input ‘RO’, ‘b’, is removed, then you will see that those rows will be fetched whose name starts with RO:

>> SELECT * FROM painter WHERE name LIKE 'Ro%';

Hence, you can see that two rows are selected. To filter the results again, we can use this operator in two ways by applying it in two columns. For example, the name that starts with the same word ‘Ro%’ will be searched, but the nationality column will be applied with the row’s filter starting with the nationality name ‘A%’.

>> SELECT * FROM painter WHERE name LIKE ‘Ro%’ AND nationality LIKE ‘A%’;

Thus, a single row will be selected according to the condition.

Example 3: Searching by Using Trigrams

PostgreSQL has a helpful extension that is used to enable the pg_trgm extension. This extension is used to bring some features of fuzzy string search. The trigrams are formed when a string is broken into a group of three letters consecutively. First, we need to install the extension in PostgreSQL. Because without enabling the extension, we cannot apply fuzzy search in different ways:

>> CREATE EXTENSION pg_trgm;

An extension is added to the database. Now, we will apply some commands used according to the installed extensions. In this sort of search, we use the keyword ‘SIMILARITY’ to search for the closest possible match by providing the probability of range that shows how similar the strings are on the scale between 0 and 1. In the following example, we will search from the table by applying a where clause having similarity like ‘johr’, so the 4th letter in the word is changed. The search will be according to the mistyped letters:

>> SELECT * FROM painter WHERE SIMILARITY (name, 'johr') > 0.4;

The result will contain the row with the column name similar to john that is john in the table.

Now, we will apply the similarity concept to the nationality column in the rows containing ‘an’ in the string present in the PostgreSQL data. The resultant will be limited to the 2 rows:

>> SELECT * FROM painter ORDER BY SIMILARITY (nationality, 'an') LIMIT 2;

The resultant will contain only two rows as we have applied the limit. So those rows having the nationality of containing ‘an’ in them will be displayed.

Example 4: Searching Through Phonetic Algorithms

Another way of searching is through using phonetic algorithms. These algorithms use the rules used to represent a string by using a shortcode. For this purpose, PostgreSQL uses an extension to be installed so that we can enable the working of the search. This extension is a fuzzy string matcher:

>> CREATE EXTENSION fuzzystrmatch;

After installing the extension, we will use any feature of the extension. For example, we will use a feature known as Metaphone.

Metaphone: This algorithm is used to return the code of different lengths. This is like the simple function we use in PostgreSQL. Similarly, we can pass arguments to this function. For example, the length is specified that returns the value to that length.

In the example, we will select id, name, and a new column with the name Metaphone that will shorten the name of the painter to form an abbreviation. The where clause will contain a condition to search for the nationality equal to American:

>> SELECT id, name, METAPHONE (name, 10) FROM painter WHERE nationality = 'american' LIMIT 2;

You can see that on execution, the Metaphone column will contain the shortened forms of the names of the painter.

Conclusion

This article throws light to the partial searching of the data by applying fuzzy strings matching. If the user is unaware of the complete names or the partial name, he can easily get the information about the record by using this type of search. We have used elementary examples regarding both the algorithms and extensions in PostgreSQL. We hope you found this article helpful. Check out other Linux Hint articles for more tips and information.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.