PostgreSQL

Postgres TSVECTOR

Full-text search refers to a feature for searching a collection of documents or a database that considers all the words that are contained in the records as opposed to traditional searching techniques that consider only some parts of the documents.

This method of searching is highly beneficial for text-heavy documents and allows for advanced search features.

In databases such as PostgreSQL, we can access the tsvector data type to implement the full-text search features.

This tutorial explores the fundamentals of working with the tsvector data types in the PostgreSQL server.

Prerequisites:

To follow along with this post, ensure that you have an installed PostgreSQL server on your machine.

PostgreSQL TSVECTOR

The tsvector data type in PostgreSQL allows us to represent the documents in a form that is optimized for text search. Think of it as a sorted list of distinct words which are known as lexemes that appear within the stored document.

The sorted list of distinct lexemes are words that are normalized to merge the different variants of the same word. So, for example, runs, running, and ran are all normalised to the lexeme run.

A tsvector provides full-text search features by indexing the particular words in the stored document.

Create the TSVECTOR

We can create a tsvector from a string by converting the text to a tsvector using the to_tsvector function.

select to_tsvector('hello world!');

This function returns a tsvector object which is a sorted list of distinct lexemes.

Example Output:

to_tsvector

---------------------

'hello':1 'world':2

(1 row)

Using the TSVECTOR for Full Text Search

One of the significant use of tsvector in PostgreSQL is the full-text search. Let us see how we can use them to implement a full-text-like feature in PostgreSQL.

Suppose we have a books table:

CREATE TABLE books (

id SERIAL PRIMARY KEY,

title TEXT,

description TEXT,

tsvector_title_desc tsvector

);

Insert the sample data into the table as follows:

INSERT INTO books (title, description)

VALUES

('Postgres Guide', 'This guide will introduce you to PostgreSQL.'),

('Advanced Postgres', 'Get deeper into PostgreSQL with this advanced guide.'),

('Python Guide', 'Learn Python from scratch with this comprehensive guide.');

To perform a full-text search, we can use the @@ operator with the ts_query function as demonstrated in the following example query:

SELECT title, description

FROM books

WHERE to_tsvector(title || ' ' || description) @@ to_tsquery('PostgreSQL');

Output:

As shown in the given output, the query returns the books where the PostgreSQL term appears in the title or the description.

Update the TSVECTOR

If we change the text that we’re searching for, we need to update the corresponding tsvector.

An example query is as follows:

UPDATE books SET tsvector_title_desc = to_tsvector(title || ' ' || description);

We would have to run this whenever we insert or update a book in the table. We can configure a trigger to do this for us automatically.

CREATE OR REPLACE FUNCTION books_tsvector_trigger() RETURNS trigger AS $$

begin

new.tsvector_title_desc := to_tsvector(new.title || ' ' || new.description);

return new;

end

$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE

ON books FOR EACH ROW EXECUTE FUNCTION books_tsvector_trigger();

PostgreSQL automatically updates the tsvector whenever we insert or update a book table.

Conclusion

We looked at how to gain the full-text search features in PostgreSQL using the tsvector data type. You can check the documentation to fully explore the full capabilities of the tsvector on your database.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list