Python

XLSX to CSV in Python

In recent years, Python has risen as one of the dominant programming languages. The Python community has significantly expanded as a result of Python’s popularity and utility. We’ll go into detail in this article about how to use Python to convert XLSX to CSV. We are aware that excel files are frequently used to perform financial analysis, organize the data, data entry, accounting, data management, create charts and graphs, etc. However, we mostly prefer to use CSV files for data import and export. It’s one approach is to transfer the structured data using a plain text file.

What Is an Xslx File?

When making spreadsheets in the latest versions of Excel, Microsoft uses an XLSX file as the default file type. Similar to Word’s DOCX file type, we can open the XLSX files using a variety of applications/programs.

What Is a CSV File?

The CSV file format is what we can use to store the tabular data values in spreadsheets and databases. The tabular data, text or text, is stored as plain text in a CSV file. A record in the CSV file is the data value stored on each row of the file. Each record can have more than one field that is separated by commas.

Key Differences Between XLSX and CSV Files

Users often use CSV and XLSX interchangeably, and they are generally unaware of the distinctions between them, what their fundamental characteristics are, and how they work. It only serves as a source of information for the user. Nearly every business and corporate use both to assist the users in managing, updating, and storing the data for the user. The following are some key differences between XLSX and CSV files that you should know:

  • The CSV format saves tabular data into a delimiter text file with the .csv extension. Whereas, excel or XLSX file is a spreadsheet that stores files in its proprietary format, viz. xls or XLSX.
  • Excel files are binary files that contain the data on every worksheet in a workbook. Whereas, CSV is a plain and simple text format with a sequence of data values separated by commas.
  • Data operations cannot be carried out on CSV files. However, they can be carried out on Excel files.
  • CSV files are quicker and use less memory than XLSX files. However, Excel uses more memory when importing the data.
  • If we compare CSV with Excel, we can open the CSV files in any Windows text editor while Excel files cannot.

How Can an XLSX File be Converted to CSV Using Python?

An XLSX file can be transformed into a CSV file in Python using a variety of methods. Python includes a variety of modules and functions which can assist us to accomplish this task. Here, we’ll go through a couple of methods that can be used to convert the XLSX files to CSV files in Python.

Prerequisites to Perform the Excel to CSV Conversion

Installing the requirements is the first step. We use the modules like combine Pandas, CSV, and Openpyxl, etc. in this tutorial. Python’s Pandas package allows for the manipulation and analysis of data. Pandas is a well-known library among Python programmers. Excel files can be read and written using the Python package known as Openpyxl. We won’t be dealing with this library directly. Pandas instead make use of Openpyxl internally.

Using the PyPI repository, we can install both packages:

Method 1: Converting XLSX to CSV Using the Pandas Module

Example 1: Converting One XLSX File to CSV

Pandas is an open-source module created for the Python programming language to manipulate and analyze the data. To work with time series and numerical tables, Pandas provides a variety of functionality and features. Using pandas, both small and huge datasets can be read, filtered, and rearranged. And the results can be produced in a variety of formats such as Excel, JSON, and CSV. We use the read_excel() method of Pandas to read an excel file, and the to_csv() method to transform the DataFrame into a CSV file.

We print our .csv file as a DataFrame as can be seen in the output. The previous script converts our XLSX file into CSV and creates a “salary.csv” file in the current directory.

Example 2: Converting XLSX File (with Multiple Sheets) to CSV

In the example that follows, we first read the Excel file’s sheet names. The names of individual sheets are then looped through and saved as distinct CSV files. The script saves CSVs in the same location.

We use a sample XLSX file having two or more worksheets or spreadsheets.

As can be seen, our XLSX file consists of two sheets (sheet1 and sheet2). Now, we write a code to convert this XLSX file into CSV.

Output:

The script successfully converted the XLSX file with two sheets into a single CSV file with a single sheet.

Example 3: Converting Multiple XLSX Files into Separate CSV Files

Consider a scenario in which you require to convert several Excel files in your working directory into CSV. You could employ this method. We start by assuming that each file only contains one sheet. Then, we extend our method to handle multiple files and several sheets. The Python code that follows makes use of the glob standard module. We match the file paths using patterns with Glob. It matches all files with the .xlsx extension in the working directory. After that, we develop a function that reads the Excel files and saves them as CSV files. We call this function on each file path that is found.

This previous script converts all the xlsx files in the current directory into CSV files.

Now, we convert the XLSX files with multiple spreadsheets into CSV. This is most likely the difficult part. Three Excel files are in our working directory. Also, some of them consist of more than one sheet. We aim to:

  1. Create a directory for each spreadsheet file,
  2. Convert the excel file sheets to CSV and they should be kept in the newly created directory.

The script converts the XLSX with single and multiple sheets into separate CSV files and store them in new directories with the same name.

To retrieve every Excel file located in the current directory, the glob function is used. After that, folders are created for each XLSX file using the os.mkdir function. A CSV file is then created for each sheet inside the new directory by looping over the sheet name.

Method 2: Converting XLSX to CSV Using Openpyxl and CSV Modules

In this method, we will use the openpyxl and CSV modules to convert the XLSX files into CSV. To read and write 2010 xlsx, xlsm, xltx, and xltm files, the Python module openpyxl can be used. The CSV module includes classes that enable the reading and writing of CSV-formatted tabular data.

To read or load our xlsx file, we use the load_workbook() function of the openpyxl module. You can use this function if you need to write or read an existing XLSX/Excel file in Python. After the activation of excel, we use the csv.writer() function to create our CSV file. Then, the for-loop is used to store the data in the data cells of the CSV file. We convert our example.xlsx file into myfile.csv as seen in the following image:

Conclusion

In this tutorial, we gave you a short introduction to XLSX files and CSV files. We explained the key differences between both file formats. We discussed two methods with multiple examples to teach how the XLSX files with single or multiple sheets can be converted into CSV files. We implemented an example to convert multiple XLSX files into CSV files simultaneously. It can be simple or difficult to convert the data from Excel to CSV. If you only have one file with a small number of sheets, it’s simple. But if not, it can be difficult.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.