LangChain

How to Use Retrievers in LangChain?

LangChain is the framework that allows developers to build Large Language Models (LLMs) that can be used to interact with humans. Machines need to learn human languages to interact with them using LangChain as it provides all the necessary libraries and dependencies to build these models.

This post will demonstrate the process of using retrievers in LangChain.

How to Use Retrievers in LangChain?

Retrievers act as the interface between models and humans so they can use it to get the desired output using the input provided in natural languages. Vector stores are used to store data which can be used to get/extract information/data from.

However, the retrievers are more general than these databases. They do not store any data and are only used to get or retrieve data for the users. To learn the process of building and using the retrievers through LangChain, look at the following steps:

Step 1: Install Modules
First, install the required modules like LangChain to get its libraries and dependencies to go on with the process:

pip install langchain

Install chromadb vector store which can be used to database data for the retriever to fetch data from the store:

pip install chromadb

Now, install OpenAI framework to get its libraries for using the text embedding before building a retriever:

pip install openai

After installing all the required modules, simply set up the environment using the OpenAI API key:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Step 2: Upload Dataset
Now, execute the following code to click on the “Choose Files” button and upload the document or file from the local system:

from google.colab import files
uploaded = files.upload()

Step 3: Import Libraries
Import the required libraries to build and use the retrievers in LangChain such as “List”, “Callbacks”, and many more:

from abc import ABC, abstractmethod
from typing import Any, List
from langchain.schema import Document
from langchain.callbacks.manager import Callbacks

Step 4: Create One Line Index Creation
This step creates the index for the retriever that can be used to get the data to form the vector store by importing the required libraries:

from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

Here, load the data using the TextLoader() method with the path of the file uploaded in step 2:

import TextLoader from langchain.document_loaders
loader = TextLoader('state_of_the_union.txt', encoding='utf8')

Import library VectorstoreIndexCreator from LangChain to build an index for the database:

import VectorstoreIndexCreator from langchain.indexes

Define the index variable using the VectorstoreIndexCreator() method using the loader variable:

index = VectorstoreIndexCreator().from_loaders([loader])

Apply the query to test the index by fetching data from the document:

query = "What did the president Zelenskyy said in his speech"
index.query(query)

Get the details of the index as to which database has the index using the following code:

index.vectorstore

The following code will explain all the details about the index, its type, and database:

index.vectorstore.as_retriever()

Use the index with query() method asking for the summary of the document using the source argument to use the name of the document:

index.query("General summary of data from this document", retriever_kwargs={"search_kwargs": {"filter": {"source": "state_of_the_union.txt"}}})

Step 5: Create Embeddings
Load the document for creating its embedding and store the text in the numerical form using the vector store:

documents = loader.load()

Start the process of embedding using the text_splitter with the chunks size and overlap arguments:

from langchain.text_splitter import CharacterTextSplitter
#using text_splitter to create small chunks of the document to use retriever
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

Apply OpenAIEmbeddings() method that can be imported from LangChain:

from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

Use the chromadb store to store the embeddings created from the document:

from langchain.vectorstores import Chroma
db = Chroma.from_documents(texts, embeddings)

Step 6: Test the Retriever
Once the embeddings are created and stored in the database simply define the retriever variable:

retriever = db.as_retriever()

Call the chains using the RetrievalQA() method with the OpenAI() function and retriever as its arguments:

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

Provide the input to test the retriever using the query variable inside the qa.run() method:

query = "What did the president Zelenskyy said in his speech"
qa.run(query)

Simply customize the VectorstoreIndexCreator() using its arguments to set different values:

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=Chroma,
    embedding=OpenAIEmbeddings(),
    text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
)

That is all about the process of getting started with retrievers in LangChain.

Conclusion

To use the retrievers in LangChain, simply install the dependencies required to set up the OpenAI environment and then upload the document to test the retrievers. After that, build the retriever using an abstract base class or ABC library and then create the index for the database to retrieve the data. Configure the embeddings for the document and run the retriever to get comparable results from the database. This post has elaborated on the process of using the retrievers in LangChain.

About the author

Talha Mahmood

As a technical author, I am eager to learn about writing and technology. I have a degree in computer science which gives me a deep understanding of technical concepts and the ability to communicate them to a variety of audiences effectively.