Python

zlib Python

The Python zlib library offers a Python interface for the C-written zlib library, a higher-level generalization for deflated lossless compression methods. You can use the zlib compression feature in commercial products because it is free and not patented. It is portable across several platforms because it uses lossless compression, which prevents data loss during compression and decompression. Another significant advantage is that this compression method does not enlarge the data. The zlib methods to compress and decompress data, datastream, and files will be covered in this tutorial.

Why Use zlib in Python?

In terms of security, this library/module plays a significant role. Numerous applications need the ability to compress and decompress arbitrary data, including strings, files, and structured in-memory content. One of the best features of the zlib module is its compatibility with the gzip tool/file format, which is a widely used and popular compression application on UNIX.

How Do We Use zlib’s Compression and Decompression Functions in Python?

Compression and decompression are the two most significant functions offered by the zlib library. Both compression and decompression operations can be done as one-time operations or by breaking the data into chunks, as you would see from a stream of data. The following section will explain both modes of operation.

Compression in Python zlib

Now, we will see how to compress a data string, data strings, and files using the zlib library.

Compressing Data Strings

Compressing a string is possible with the help of the zlib library’s compress() function. This function’s syntax is fairly simple and only requires two arguments.

Syntax:

compress(data_string, level)

Here, data_string is the string of data to be compressed, and the level argument can be specified by an integer value that might range from -1 to 9. This level parameter determines the level of compression. If the level is specified as 1, the compression level will be the lowest/minimum. However it will be the fastest. The slowest level of compression, i.e., 9, produces the highest/maximum level of compression. The default, level 6, is represented by the value -1. Speed and compression are balanced at the default value. There is no compression at level 0.

An example of how to use the compress method to a string is provided below:

If we specify the level parameter as 0, there will be no compression:

When comparing the results, the compression level is set to 0 and 2. You might see a little difference. In hexadecimal format, at level = 2, the function has returned a string of length 62, whereas, at level 0, a string of length 68 is returned. This length difference comes because no compression occurs at level 0.

Compressing Data Streams

The compressobj() method can handle large data streams. A compression object is returned by the compressobj() function.

Syntax:

compressobj(level, method, wbit, memLevel, strategy)

Aside from the string data parameter, the only distinction between the parameters of this method and those of compress() is the wbits parameter, which regulates the window size and determines whether or not to include the header and trailer in the output. The following are possible wbits values:

Value Range Window size logarithm Output
+9 to +15 Base 2 The trailer and zlib headers are included.
+9 to -15 Represent the wbit’s absolute value. The header and trailer are excluded.
+25 to +31 The value’s lowest four bits. The header and trailing checksum is included.

 

The method argument will specify the compression algorithm. The default algorithm, or currently available algorithm, is DEFLATED. The strategy argument defines compression tuning.

The following code shows how to use the compressobj() function:

We used simple string data that is not a large data stream to demonstrate how the compressobj() function operates. The string “I love writing python codes” has been compressed. This method is typically employed when the streams are too big or difficult to store in memory. This technique is crucial in larger applications because we can customize the compression and use it to compress data in chunks/series.

Compressing a File

The file will be compressed using the compression() function. The .docx file will be compressed in the example below.

The best level of compression offered by this algorithm, Z BEST COMPRESSION, is used in the compress function. The ratio of the length of compressed data to the original data length is used to compute the level of data compression. The file is 6% compressed, as seen in the output.

Decompression in Python zlib

Now, we will see how to decompress a data string, data strings, and files using the zlib library.

Decompressing Data Strings

The decompress() function makes it simple to decompress a string of compressed data.

Syntax:

decompress(data_string, wbit, bufsize)

This function decompresses the string data bytes. The history buffer’s size can be set using the wbits parameter. The largest window’s size is used as the default setting. The compressed file’s header and trailer are also asked to be included. Possible values are:

Value Range Window size Input
+8 to +15 Base 2 The trailer and zlib headers are included.
-8 to -15 The wbit’s absolute value. Raw stream, header, and trailer are excluded.
+24 to +31 = 16 + (8 to 15) The value’s lowest four bits. The trailer and zlib headers are included.
+40 to +47 = 32 + (8 to 15) The value’s lowest four bits. gzip or zlib format

 

The bufsize argument specifies the buffer size’s initial value. However, the key feature of this argument is that if the specified is not exact, it will be automatically adjusted if more buffer size is required. The data string that is compressed can be decompressed using the following example:

Decompressing Data Streams

Depending on the source and size of your data, decompressing large data streams might require memory management. The decompressobj() method enables you to split a stream of data into multiple chunks that we can decompress separately if you cannot use all of the resources or don’t have enough memory.

Syntax:

decompressobj(wbits [, zdict])

To decompress the particular data, we will use the decompression object that the decompressobj() function returns. The wbits parameter has the same features as those described before for the decompress() function.

The length compressed stream is 34. After decompressing, the stream length of 26 is returned, which was the actual size of our data stream.

Decompressing a File

The data in the file can be easily decompressed, as we have seen in previous examples. The only difference between this example and the prior one is that, the decompress() function will be used after getting the compressed data from the file. This method is useful when data is compact enough to fit conveniently in memory.

The sample.txt file containing the string “hello world I love python” has been read. We chose to use decompress() rather than decompressobj() because the file contains a small string.

Conclusion

We discussed that when an application requires secure compression, the Python zlib module is useful. Although the zlib library has many functions, we have covered a few of the most common ones. The compress() and decompress() functions are used for small data, whereas the compressobj() and decompressobj() functions provide the compression and decompression of large data streams, and so offer greater flexibility.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.