How to Use Markdown Loader in LangChain?

Markdown is a lightweight markup language that is used to add formatting to documents with plain text in human-like languages. LangChain is the framework for building Large Language Models or LLMs to interact with humans in natural languages. Markdown language is an important aspect of training the models to understand the text written in human languages like English and many more.

This guide will demonstrate the process of using the markdown loader in LangChain.

How to Use Markdown Loader in LangChain?

Markdown files are stored with the extension “.md” and it can be used to train LLMs in LangChain. The framework enables the user to load these files using the “UnstructuredMarkdownLoader” library to load these files from its directory and use them in the model.

To use the markdown loader in LangChain, simply follow this guide:

Step 1: Setup Prerequisites

Firstly, install the LangChain framework using the “pip install” command to get the UnstructuredMarkdownLoader library:

pip install langchain

Now, install the unstructured framework to load the unstructured data/files in the markdown language:

pip install unstructured > /dev/null

After that, simply import the “UnstructuredMarkdownLoader” library from the LangChain:

from langchain.document_loaders import UnstructuredMarkdownLoader

Step 2: Load Data

The next step is to get the path of the markdown file by clicking on the three dots in front of the file to expand its menu and then clicking on the “Copy path” button:

Step 3: Using MarkdownLoader

Paste the path of the markdown file and load it using the markdown loader with the variable containing the path of the file:

markdown_path = "/content/sample_data/README.md"
loader = UnstructuredMarkdownLoader(markdown_path)

After that, simply execute the loader to load the contents of the markdown file:

data = loader.load()

Print the data variable containing the file contents to display them on the screen:

data

Step 4: Retain Elements Using MarkdownLoader

Simply define the MarkdownLoader() method with the extra parameter to retain the elements from the document as unstructured data combines some elements while loading. These elements can easily be separated and retained using the following code with the parameter called “mode”:

loader = UnstructuredMarkdownLoader(markdown_path, mode="elements")

Load the data variable with the contents of the file:

data = loader.load()

Print the first index of the file loaded that was split into small chunks while loading:

data[0]

That is all about using the markdown loader in LangChain.

Conclusion

To use the markdown loader in LangChain, simply install LangChain and unstructured frameworks to load the markdown loader library. After that, get the path of the markdown file from the directory and load it using the “UnstructuredMarkdownLoader” library. This post has illustrated the complete process of using the markdown loader in the LangChain framework.

How to Use Markdown Loader in LangChain?

How to Use Markdown Loader in LangChain?

Conclusion

About the author

Talha Mahmood