Markdown is a lightweight markup language that is used to add formatting to documents with plain text in human-like languages. LangChain is the framework for building Large Language Models or LLMs to interact with humans in natural languages. Markdown language is an important aspect of training the models to understand the text written in human languages like English and many more.
This guide will demonstrate the process of using the markdown loader in LangChain.
How to Use Markdown Loader in LangChain?
Markdown files are stored with the extension “.md” and it can be used to train LLMs in LangChain. The framework enables the user to load these files using the “UnstructuredMarkdownLoader” library to load these files from its directory and use them in the model.
To use the markdown loader in LangChain, simply follow this guide:
Step 1: Setup Prerequisites
Firstly, install the LangChain framework using the “pip install” command to get the UnstructuredMarkdownLoader library:
Now, install the unstructured framework to load the unstructured data/files in the markdown language:
After that, simply import the “UnstructuredMarkdownLoader” library from the LangChain:
Step 2: Load Data
The next step is to get the path of the markdown file by clicking on the three dots in front of the file to expand its menu and then clicking on the “Copy path” button:
Step 3: Using MarkdownLoader
Paste the path of the markdown file and load it using the markdown loader with the variable containing the path of the file:
loader = UnstructuredMarkdownLoader(markdown_path)
After that, simply execute the loader to load the contents of the markdown file:
Print the data variable containing the file contents to display them on the screen:
Step 4: Retain Elements Using MarkdownLoader
Simply define the MarkdownLoader() method with the extra parameter to retain the elements from the document as unstructured data combines some elements while loading. These elements can easily be separated and retained using the following code with the parameter called “mode”:
Load the data variable with the contents of the file:
Print the first index of the file loaded that was split into small chunks while loading:
That is all about using the markdown loader in LangChain.
Conclusion
To use the markdown loader in LangChain, simply install LangChain and unstructured frameworks to load the markdown loader library. After that, get the path of the markdown file from the directory and load it using the “UnstructuredMarkdownLoader” library. This post has illustrated the complete process of using the markdown loader in the LangChain framework.