LangChain

How to Use Self-Querying with MyScale in LangChain?

MyScale is a cloud-based database that assists AI technology in building efficient models and getting data from the database. LangChain is used to create Large Language Models (LLM) as question-answering chatbots to extract data using natural language queries. The user can create their own dataset and design queries related to the data to retrieve information quickly in LangChain.

This post demonstrates the process of using self-query with MyScale in LangChain.

How to Use Self-Querying with MyScale in LangChain?

To use self-query with MyScale in LangChain, let’s look at this guide with simple steps to perform the task:

Step 1: Installing Modules

To start the process of using self-query with MyScale in LangChain, install the LangChain framework first and then move forward with the example:

pip install langchain

 

Install the OpenAI framework to use the API key of the framework. After that, LLM is used to generate text in natural language:

pip install openai

 

To create a self-query with MyScale, simply install Lark with the “clickhouse-connect” library:

pip install lark clickhouse-connect

 

Install the tiktoken tokenizer to split the text into small chunks before embedding it using the OpenAI functions:

pip install tiktoken

 

Step 2: Create MyScale Vector Store

After installing all the necessary libraries for using self-query in MyScale, simply import all the credentials for building the LLM model and accessing MyScale resources. The credentials include the OpenAI key, host, port, username, and password for the MyScale account:

import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
os.environ["MYSCALE_HOST"] = getpass.getpass("MyScale URL:")
os.environ["MYSCALE_PORT"] = getpass.getpass("MyScale Port:")
os.environ["MYSCALE_USERNAME"] = getpass.getpass("MyScale Username:")
os.environ["MYSCALE_PASSWORD"] = getpass.getpass("MyScale Password:")

 

After successfully providing the credentials for the MyScale account and OpenAI API, simply import MyScale from the vector stores of LangChain. OpenAI library is used to call the OpenAIEmbeddings() function for embedding the data from the vector store:

from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import MyScale
 
embeddings = OpenAIEmbeddings()

 

Step 3: Insert the Data into MyScale

Insert data into the MyScale database by simply creating documents for movies with their storyline and metadata like the year, rating, director, genre, etc.

docs = [
    Document(
        page_content="Earth is a million years old",
        metadata={"year": 2003, "rating": 8.7, "genre": "science fiction"},
    ),
    Document(
        page_content="Mark Boucher gets lost in space",
        metadata={"year": 2009, "director": "Ab De-Villiers", "rating": 9.2},
    ),
    Document(
        page_content="A doctor gets lost in a series of dreams",
        metadata={"year": 2006, "director": "Satoshi Kon", "rating": 7.6},
    ),
    Document(
        page_content="A bunch of highly talented ladies/women are saving the world",
        metadata={"year": 2019, "director": "Sara Taylor", "rating": 8.3},
    ),
    Document(
        page_content="Toys cars are fighting for their existing at racing track",
        metadata={"year": 2000, "genre": "animated"},
    ),
    Document(
        page_content="prisoners plan to escape but caught",
        metadata={
            "year": 2009,
            "director": "Ben Ducket",
            "genre": "thriller",
            "rating": 9.9,
        },
    ),
]

vectorstore = MyScale.from_documents(
    docs,
    embeddings,
)

 

Step 4: Create Self-Querying Retriever

After inserting data in the MyScale vector store, simply create a self-query to fetch data from the database using the following code. The self-query retriever will generate the data about the film like the genre, date, director, and rating of the movie using the LLM in LangChain:

from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [
    AttributeInfo(
        name="genre",
        description="The genre of the movie",
        type="string or list[string]",
    ),
    AttributeInfo(
        name="year",
        description="The year the movie was released",
        type="integer",
    ),
    AttributeInfo(
        name="director",
        description="The name of the movie director",
        type="string",
    ),
    AttributeInfo(
        name="rating", description="A 1-10 rating for the movie", type="float"
    ),
]
#configure the retriever using the LLM in OpenAI application to get data from database
document_content_description = "Get basic info about the movie"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm, vectorstore, document_content_description, metadata_field_info, verbose=True
)

 

Step 5: Test Retriever

Once the retriever is created, simply run the retriever with queries written in the natural language:

retriever.get_relevant_documents("Movies about science")

 

The movies retrieved for the query mentioned in the above code are displayed in the following screenshot:

Step 6: Specifying Filter K With Retriever

The user can also configure the retriever to fetch the exact number of documents by enabling the limit:

retriever = SelfQueryRetriever.from_llm(
    llm,
    vectorstore,
    Document_content_description,
    Metadata_field_info,
    enable_limit=True,
    verbose=True,
)

 

Run the retriever with a query stating the exact number of documents to fetch from the database:

retriever.get_relevant_documents("Give me two movies about science")

 

The model has retrieved exactly two movies from the database as asked in the above prompt:

That’s all about using the self-query with MyScale in LangChain and the user can use multiple prompts in retrievers provided in this guide.

Conclusion

To use self-query with MyScale in LangChain, simply install frameworks like LangChain, OpenAI, Lark, tiktoken, etc. Accessing the MyScale database requires the connection credentials which are necessary to get on with it. After that, insert data into the MyScale database and create a self-query retriever to fetch data using prompts. This post demonstrated the process of using self-querying retrievers with MyScale in LangChain.

About the author

Talha Mahmood

As a technical author, I am eager to learn about writing and technology. I have a degree in computer science which gives me a deep understanding of technical concepts and the ability to communicate them to a variety of audiences effectively.