LangChain

How to Combine Agents and Vector Stores in LangChain?

LangChain is the framework that designs language models. Massive amounts of data train these models in natural language. There are many databases or vector stores like Chroma, etc. to manage these datasets. By combining the agent and vector stores, the model performs better with data from different domains. The LangChain allows the use of many vector stores to train the language model or chatbot.

Quick Outline

This post will show:

How to Use an Agent to Return a Structured Output in LangChain

Method 1: Combining Agent with Vector Stores

Method 2: Using Agent as a Router

Method 3: Using Agent With Multi-Hop Vector Store

Conclusion

How to Use an Agent to Return a Structured Output in LangChain?

The developer uses agents to route between the databases containing training data for the models. An agent has the blueprint of the complete process by storing all the steps. The agent has the tools to perform all these activities to complete the process. The user can also use the agent to get data from different data stores to make the model diverse.

To learn the process of combining agents and vector stores in LangChain, simply follow the listed steps:

Step 1: Installing Frameworks

First, install the LangChain module and its dependencies for combining the agents and vector stores:

pip install langchain

In this guide, we are using the Chroma database which can store data in different locations or tables:

pip install chromadb

To get a better understanding of data, split the large files into smaller chunks using the tiktoken tokenizer:

pip install tiktoken

OpenAI is the module that can be used to build the large language model in the LangChain framework:

pip install openai

Step 2: OpenAI Environment

The next step here is to set up the environment using the OpenAI’s API key which can be extracted from the OpenAI official account:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Now, upload the data from the local system to the Google collaboratory in order to use it in the future:

from google.colab import files

uploaded = files.upload()

Step 3: Creating a Vector Store

This step configures the first component of our task which is a vector store for storing the uploaded data. Configuring the vector stores requires the libraries that can be imported from different dependencies of the LangChain:

from langchain.embeddings.openai import OpenAIEmbeddings

#Vector stores dependency to get the required database or vector

from langchain.vectorstores import Chroma

#Text splitter is used to convert the large text into smaller chunks

from langchain.text_splitter import CharacterTextSplitter

from langchain.llms import OpenAI

from langchain.document_loaders import WebBaseLoader

from langchain.chains import RetrievalQA


llm = OpenAI(temperature=0)

Step 4: Setting the Path

After importing the libraries, simply set the path for accessing the vector stores before storing the data in it:

from pathlib import Path

relevant_parts = []
for p in Path(".").absolute().parts:
    relevant_parts.append(p)
    if relevant_parts[-3:] == ["langchain", "docs", "modules"]:
        break
#Conditional Statement inside the loop to set the path for each database
doc_path = str(Path(*relevant_parts) / "state_of_the_union.txt")

Step 5: Loading & Splitting the Data

Now, simply load the data and split it into smaller chunks to make its readability and understandability better. Create embeddings of the data by converting the text into numbers making their vector spaces and storing it in the Chorma database:

from langchain.document_loaders import TextLoader

#Loading dataset from its path and store its smaller chunks in the database

loader = TextLoader(doc_path)

documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0)

texts = text_splitter.split_documents(documents)

#Convert text into numbers and store the embeddings in the database

embeddings = OpenAIEmbeddings()

docsearch = Chroma.from_documents(texts, embeddings, collection_name="state-of-union")

Step 6: Creating a Retriever

To combine agent and vector stores, it is required to create a retriever using the RetrievalQA() method from the LangChain framework. This retrieval method is recommended to get data from vector stores using the agents as the tool for working with the databases:

state_of_union = RetrievalQA.from_chain_type(

   llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()

)

Load another dataset to integrate the agent with multiple datasets or vector stores:

loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

Store the ruff dataset in the chromadb after creating the smaller chunks of the data with the embedding vectors as well:

docs = loader.load()
ruff_texts = text_splitter.split_documents(docs)
ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name="ruff")
ruff = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_db.as_retriever()
)

Method 1: Combining Agent with Vector Stores

The first method of combining both agents and vector stores to extract information is mentioned below:

Step 1: Configure Tools

Now that the vector stores are configured, moving on towards building the second component of our process i.e. agent. To create the agent for the process, import the libraries using the dependencies like agents, tools, etc.

from langchain.agents import initialize_agent
from langchain.agents import AgentType
#Getting Tools from the LangChain to build the agent
from langchain.tools import BaseTool
from langchain.llms import OpenAI
#Getting LLMMathChain from chains to build the language model
from langchain.chains import LLMMathChain
from langchain.utilities import SerpAPIWrapper
from langchain.agents import Tool

Configure the tools to be used with the agents using the QA system or retrieval configured earlier with the name and description of the tools:

tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="Provides responses to the questions related to the loaded dataset with input as a fully formed question",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Provides responses to the questions about ruff (a python linter) with input as a fully formed question",
    ),
]

Step 2: Initialize Agent

Once the tools are configured, simply set the agent in the argument of the initializa_agent() method. The agent we are using here is the ZERO_SHOT_REACT_DESCRIPTION along with the tools, llm (language model), and verbose:

agent = initialize_agent(

   tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True

)

Step 3: Test the Agent

Simply execute the agent using the run() method that contains the question in its argument:

agent.run(

"What did President Joe Biden say about kanji brown in the address"

)

The following screenshot displays the answer extracted from both the data stores using the observation stored in the memory of the agent:

Method 2: Using Agent as a Router

Another way to combine both components is by using the agent as a router and the following explains the process:

Step 1: Configure Tools

Using the agent as the router means that the RetrievalQA system will return the output directly as the tools are configured to return the output directly:

tools = [
#configuring the tools required to build the agent for getting data from the data
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="Provides responses to the questions related to the loaded dataset with input as a complete question",
        return_direct=True,
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Provides responses to the questions about ruff (a python linter) with input as a complete question",
        return_direct=True,
    ),
]

Step 2: Initialize and Test the Agent

After setting the tools simply set the agent which can be used solely as the router using the initialize_agent() method:

agent = initialize_agent(

tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True

)

Test the agent by giving the input question in the agent.run() method by executing the following command:

agent.run(

"What did President Joe Biden say about kanji brown in the address"

)

Output

The output screenshot displays that the agent has simply returned the answer to the question from the dataset extracted by the RetrievalQA system:

Method 3: Using Agent With Multi-Hop Vector Store

The third method in which the developers can combine both agent and vector stores is for the multi-hop vector store queries. The following section explains the complete process:

Step 1: Configure Tools

The first step is, as usual, the configuration of tools used for building the agents to extract data from the data stores:

tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="Provides responses to the questions related to the loaded dataset with input as a fully formed question, not referencing any pronouns from the previous conversation",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Provides responses to the questions related to the loaded dataset with input as a fully formed question, not referencing any pronouns from the previous conversation",
    ),
]

Step 2: Initialize and Test the Agent

After that, build the agent variable using the initialize_agent() method with the name of the agent:

agent = initialize_agent(

tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True

)

Run the agent using the multi-hop question that contains more than one aspect or feature as the following code block contains such a question:

agent.run(

"What tool does ruff use to run over Python notebooks and Did any of the speaker mention the tool in their address"

)

Output

The following screenshot suggests that the agent has to work through the question to understand its complexity. It has returned the answer extracted by the QA system from the multiple data stores we uploaded earlier in the process:

That’s all about how to combine agents and vector stores in LangChain.

Conclusion

To combine agents with the vector stores in LangChain, start with the installation of modules to set up the environment and load datasets. Configure the vector stores to load data by splitting it into smaller chunks first and then build the language model using the OpenAI() method. Configure the agent to integrate it with the vector store to extract data for different kinds of queries. This article has elaborated on the process of combining agents and vector stores in LangChain.

About the author

Talha Mahmood

As a technical author, I am eager to learn about writing and technology. I have a degree in computer science which gives me a deep understanding of technical concepts and the ability to communicate them to a variety of audiences effectively.