What Is the Tarfile Module in Python?
We don’t need to install the tarfile module explicitly because it is part of the Python standard library; all we need to do is “import” it. Reading and writing tar archives can be made much simpler than it should be by the methods provided by the “tarfile” Python module. The built-in “tarfile” module in Python provides a powerful interface for writing and reading tar files.
File Modes to Open the Python TAR Files
The following are some file modes that can be used in Python to open a tarfile:
r: Opens a TAR file to read.
r: Opens a TAR file and reads an uncompressed version of it.
w or w: Enables uncompressed writing by opening a TAR file.
a or a: Opens a TAR file so the data can be appended to it without compression.
r:gz: Opens a gzip-compressed TAR file for reading.
w:gz: Opens a gzip-compressed TAR file for writing.
r:bz2: Opens a bzip2 compressed TAR file for reading.
w:bz2: Opens a bzip2 compressed TAR file for writing.
Now, let’s use the tarfile module to understand how it works in Python.
Creating a Tar File
The tarfile module in Python enables us to create tar files. First, open a tar file in write mode. Then, add a bundle of files or a single file to the tar file.
Example 1: Creating a Tar File Using the Open() Function
Here, we use the open() function to create a tar file and the add() method to add other files to the tar file.
Output
Here, the first input to the open() method is the name of our tar file that has to be created, followed by the “w” argument to open the file (tarfile) in write mode. The file’s name that has to be added or archived to the “.tar” file is passed inside the add() method as input. We archive three files into our tar file using the add() function on a file object
Example 2: The Os.Listdir() Function to Create and List Files
The listdir() method returns a list of files and directories that reside in a specified directory. Using the tarfile module, we first create our tarfile. Then, we add the files into our tar file using the add() function. After adding the files to the tar file, we obtain a list of files that have been archived in the “.tar” file using the os.listdir() function. To use this function, we have to import the OS module.
The tar file “folder” is created by archiving all the files stored in the “data” folder. The getnames() function returns the file names that are stored in the tar file “folder”.
How to Verify Whether a File Is Tar or Not
Using the is_tarfile() method in the tarfile module, we can quickly determine whether a file has the “.tar” extension.
We pass the name of the “My_file.tar” file in the is_tarfile() module to check whether it is a tar file or not. The function returns True, which means that the specified file is a tar file. Let’s check for another file.
Since “note.txt” is not a tar file, the function returns False as output.
Reading a Tar File
The tarfile library can help you in reading tar files without extracting them first. The tarfile.open(filepath, mode) can be used to open a tar file in Python. The file/path is the absolute or relative location of the file that we want to read. Different kinds of parameters may be used in place of the mode.
We can also read the open file in compress mode. A compression method may optionally be combined with the operation mode. Thus, mode[:compression] becomes the new syntax. The abbreviations that support the compression methods are as follows:
- gz for gzip
- xz for lzma
- bz2 for bz2
In the previous code, we first open a bz2 compressed TAR file for writing. Then, we open the bz2 compressed TAR file for reading.
Checking the Content of the Tar File
Using the getnames() function of the tarfile module, we can view a tar file’s contents without extracting them. The list of file names is returned by the getnames() function of the tarfile module. Here, we open the file in “read” mode so “r” is specified as the second argument in the open() function.
We oepn a “project.tar” file in reading mode. The getnames() function returns the names of files that are archived in the “project.tar” file. To print the content/file names, the for-loop is used to iterate over the files in the tar file.
Appending Files to Tar File
Using the add() function from the tarfile module, we can directly add files to a tar file, just like we did when we created the tar file. Unlike prior examples, “a” is passed to the open() method as the second argument because the file must be opened in append mode.
We use the append mode to open the “project.tar” file. Five files are already present in the “paoject.tar” file. We append three more files using the add() function on the file object.
Extracting a Specific File from a Tar Archive File
The extractfile() function of the tarfile library can be used to extract one file from an archived folder. This method accepts the name of the file as input and extracts the specified file into our working directory. If you want to extract more than one file, you have to call the extractfile() multiple times. While extracting a member as a file object from the archive, a member could be a TarInfo object or a filename. An “io.BufferedReader” object returns if a member is a normal file or a link. None is presented for all other existing/current members. The KeyError is raised if a member is absent in the archive file.
The “books.xml” file is extracted from the “project.tar” file.
Extracting All Files from the Tar File
The extractall() function of the tarfile library can be used to extract the whole zipped file rather than just one particular file. The image that follows shows the directory before the files of the archived file are extracted.
The whole contents of the tar file is extracted into the folder in our current directory, the “data”, by the extractall() method. Now, let’s extract the files of the archived file to this folder.
Output
The function successfully extracted all the files to the specified folder.
Conclusion
In this article, we learned what tar files or tar archives are and how to use Python’s tarfile module to create, access, and manage the tar files. We learned the file modes which we can use to open the tar files in reading, writing, and compression modes. We explained the methods to list the files after archiving them in tar. We discussed how to append the files in the tar file. Also, we learned how a specific file or all files can be extracted into the current directory.