LangChain is the framework to build AI models like Large Language Models using natural languages to answer queries in text. To train these models, the user needs to get a huge pool of data so the model can answer a variety of questions from different users. LangChain allows the developers to use directory loaders to get the data from different locations at once.
This guide will demonstrate the process of using the file directory loaders in LangChain.
How to Use File Directory Loaders in LangChain?
To use the file directory loader in LangChain, follow this easy and simple guide:
Prerequisite: Install Modules and Upload Files
First, install the LangChain framework to get started with the process:
Then, install OpenAI to connect to its environment and use its libraries:
The “unstructured” module is also required for this process so the model can read the unstructured data as well:
Import libraries to use the operating system for establishing a connection to the OpenAI environment by providing your API key:
import getpass
os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')
Upload Files in Directories
Upload files in the directory and access them by clicking on the folder icon from the left panel:
Example 1: Check the Number of Loaded Documents
Import the DirectoryLoader library from the LangChain to start the process of using it:
Configure the DirectoryLoader() function with the path of the directory and place it in the loader variable:
Execute the loader using the load() function:
Check the number of documents loaded by the loader by getting the length from the “docs” variable:
The following screenshot displays that two files have been loaded successfully as the directory only has 2 files in the text:
Example 2: Showing a Progress Bar
Another method to use DirectoryLoader is by enabling the progress bar that displays the loading process with the help of a bar:
docs = loader.load()
Example 3: Using Multithreading
Usually, the DirectoryLoader() function uses a single thread to load files, however, the user can enable multithreading to speed up the process:
docs = loader.load()
Example 4: Change Loader Class
Import TextLoader library which is another way of loading files that are in textual form:
Configure the DirectoryLoader() function with the loader class as the TextLoader to only get text files from the said directory:
Load the files from the directory and store them in the “docs” variable:
Get the length of the “docs” variable to get how many files have been loaded successfully:
Example 5: Using PythonLoader to Load Files
To import Python code files, import the PythonLoader library and use it to get all the Python files from the directory:
Use the Python loader with the extension of the files to execute the DirectoryLoader() method:
Load the directory using the docs variable to store the files after executing the loader:
There are four Python files loaded using the PythonLoader library as displayed in the screenshot after this code:
Example 6: Using TextLoader to Detect Auto Encoding
The LangChain framework allows the user to load big data with some strategies to get efficient loading of big files using the TextLoader:
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)
Example 6.1: Using Default Behavior
Simply execute the loader using the load() function to load the content of the files:
Example 6.2: Using Silent Fail
Another strategy is to enable the silent errors feature to leave the files that are unable to load so the model does not waste time on that particular file:
docs = loader.load()
Simply print the variable containing the list of files that are simply loaded by the TextLoader:
doc_sources
Example 6.3: Auto Detect Encoding
Another strategy to get optimum file loading is using the auto detection of any encoding attached to the file to understand if anything stopping it from loading the file:
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()
Simply print the files with the path which can be loaded so the user only focuses on those files:
doc_sources
That is all about using the file directory loaders in LangChain.
Conclusion
To use the file directory loader in LangChain, simply install LangChain, OpenAI, and unstructured modules to load files from the directory. The LangChain framework offers multiple methods of using the DirectoryLoader() function with different strategies. This guide has illustrated the process of using the file directory loader with multiple methods in LangChain.