LangChain

How to Use Vector Store-Backed Retriever in LangChain?

Data is a vital part of building efficient artificial intelligence models and it becomes difficult to manage and operate. Vector stores in LangChain are used to place and manage huge data which can be used to train artificial intelligence models. The user can always retrieve these documents or data using vector store-backed retrievers like Maximum Marginal Relevance and similarity score search.

This post demonstrates the process of using vector store-backed retrievers in LangChain.

How to Use Vector Store-Backed Retrievers in LangChain?

To use a vector store-backed retriever in LangChain, simply follow this guide with multiple methods:

Prerequisite: Install Libraries and Upload Data

Before starting the process of using retrievers, simply install LangChain using the following code:

pip install langchain

 

Install the Chroma vector database using the following code which can be used to work with vector stores:

pip install chromadb

 

Install the OpenAI library using this code which can be used to embed the data:

pip install openai

 

Install the FAISS library which is used to efficiently search using the following code:

pip install faiss-gpu

 

After that, import the os module and use the “getpass()” method to provide the OpenAI API key:

import os
import getpass

os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')

 

The following screenshot displays the successful installation of the FAISS library and successful usage of the OpenAI API key:

Upload Data

After installing all the necessary modules and libraries, simply upload data from the local system using the following code:

from google.colab import files
upload = files.upload()

 

After that, load data in the loader variable using TextLoader() function and the “Data.text” is the data uploaded from the system to be stored in the vector store:

from langchain.document_loaders import TextLoader
loader= TextLoader('Data.txt')

 

The following screenshot displays that the data is successfully uploaded:

Method 1: Using a Simple Retriever

Use the following code to import necessary libraries such as CharacterTextSplitter, FAISS, and OpenAIEmbedding to retrieve data from the vector stores. CharacterTextSplitter is used to split data into small chunks, OpenAI embedding is used to embed the data, and FAISS is used to efficiently search to retrieve data:

from langchain.text_splitter import CharacterTextSplitter
#Character Text Splitter library to split text in small chunks
from langchain.vectorstores import FAISS
#FAISS library to search data from vector stores
from langchain.embeddings import OpenAIEmbeddings
#OpenAIEmbedding library to embed the data afer splitting
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
#Configure text splitting to create chunks of 100 characters
texts = text_splitter.split_documents(documents)
#Embedding after splitting using OpenAIEmbedding
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)

 

After creating the database or db variable, simply use it to make a retriever using the following code:

retriever = db.as_retriever()

 

Now, the user can apply the retriever to input command and then display it on the screen:

docs = retriever.get_relevant_documents("how was the movie")
docs

 

The following screenshot displays the data retrieved from the docs’ variable:

Method 2: Using Maximum Marginal Relevance Retriever

Now use the MMR retriever which is a vector store-backed retriever to get data using the following command:

retriever = db.as_retriever(search_type="mmr")

 

Using the same variable docs to use the retriever using the command:

docs = retriever.get_relevant_documents("what is the name of the child")

 

After that, simply check the length of retrieved documents from the docs’ variable:

len(docs)

 

Method 3: Using Similarity Score Threshold Retriever

The second vector store-backed retriever is the similarity score threshold to retrieve documents:

retriever = db.as_retriever(search_type="similarity_score_threshold",
                            search_kwargs={"score_threshold": .5})

 

Use the following code to use the input command in the retriever which is using the similarity score threshold vector store-based retriever:

docs = retriever.get_relevant_documents("what is the name of the child")

 

The following code is used to get the type of retriever used to fetch data from the document:

retriever.search_type

 

The following screenshot displays the type of search used to get data from documents:

Method 4: Using Top K Search

Another retriever that is used to fetch data from the vector stores is kwargs search with the value of key as mentioned in the following code:

retriever = db.as_retriever(search_kwargs={"k": 3})

 

Use the following code to provide the input command using the search_kwargs retriever:

docs = retriever.get_relevant_documents("how was the movie")

 

Use the following command to check the value of the K in kwargs retriever:

retriever.search_kwargs

 

That’s all about using different vector store-backed retrievers in LangChain.

Conclusion

To use the vector store-backed retriever in LangChain, simply install the LangChain, Chroma, FAISS, and OpenAI framework. There are a couple of vector store-backed retrievers such as MMR and SST to fetch data from the vector stores. This guide demonstrated the process of using both the vector store-backed retrievers and another retriever that is not a vector store-backed retriever.

About the author

Talha Mahmood

As a technical author, I am eager to learn about writing and technology. I have a degree in computer science which gives me a deep understanding of technical concepts and the ability to communicate them to a variety of audiences effectively.