How to Work with AutoFeatureExtractor in Transformers?

Transformers is an important concept and functionality in NLP and lay the foundation of many complex and unique algorithms to simplify day-to-day tasks. Transformers is a huge library of classes and functions that can be used to achieve complex yet machine-necessary functionality. It assists in image classification, model training, predictions, or generating or classifying different types of text data.

This article provides a comprehensive guideline on how to work with AutoFeatureExtractor in Transformers.

How to Work with AutoFeatureExtractor in Transformers?

A feature extractor works on the audio and vision models and is responsible for recognizing and extracting the input features. These features range from sequence to Log-Mel Spectrogram features, image features extraction, emotions extractions, etc.

Let us explore some steps in which we can implement AutoFeatureExtractor in Transformers:

Step 1: Install Transformer
To get started with AutoFeatureExtractor, we will first install the transformers library using the pip command:

!pip install transformers

Step 2: Install Pydub Library
In this tutorial, we are working with the audio models. Therefore, install the pydub library that helps in manipulating the audio with a simple and easy-to-use high-level interface. To install this library, provide the following command:

!pip install pydub numpy

Step 3: Import AudioSegment
After installing the pydub library, import “AudioSegment” from the pydub library. Furthermore, import the numpy library which is used for the computation in Machine Learning and deep learning:

from pydub import AudioSegment
import numpy as np

Step 4: Convert the File
Now, use the AudioSegment() function to convert the MP3 audio file to the “pydub” object which will be further provided to the AutoFeatureExtractor for computation. To convert the file, provide the following command:

audio = AudioSegment.from_mp3("/content/Hi there this is you.mp3")

In this code, we have uploaded a file to the Google Colab and provided its path to the Audiosegment.from_mp3() function.

To upload an mp3 file to Google Colab, click on the highlighted option as seen in below screenshot:

After that, select the file and upload it to Google Colab. Note that this file will only retail for the current session of the Google Colab:

Step 5: Convert to Numpy Array
Now we will convert this pydub object to a Numpy array. For this purpose, we have used the bytearray() function which will return a byte array object i.e., array of bytes:

audio_arr = np.array(bytearray(audio.raw_data))

Step 6: Extract Features
To extract features using AutoFeatureExtractor, first import the “transformers” library. After that, import the AutoFeatureExtractor library. Using the from._pretrained() function, we will first train the model and assign its values to the “feature_extractor” variable. Then, provide the value of “audio_array” to the feature_extractor() method that extracts the features of the audio file:

from transformers import AutoFeatureExtractor

feature_extractor = AutoFeatureExtractor.from_pretrained(

The output is given in the form of numerical values. The model has returned an array. “Attention mask” is also given which is in the form of an array:

That is all from this guide. The link to the above Google Colab is also mentioned.


To implement AutoFeatureExtractor, use AutoFeatureExtractor() from the “transformers” library, and provide a model and a pydub audio object to it. AutoFeatureExtractor is an important utility of the Transformer library, and it falls under the generic category of Auto Class. This article is a step-by-step guide for working with AutoFeatureExtractor in Transformers.

About the author

Syed Minhal Abbas

I hold a master's degree in computer science and work as an academic researcher. I am eager to read about new technologies and share them with the rest of the world.