Python

Pandas DataFrame from CSV

A DataFrame is a two-dimensional data structure in Python that is accessed by the Pandas module that stores the content in a tabular format. In other words, using columns and rows. Every column in a DataFrame may hold a varied kind of item.

CSV files or “comma-separated values” are the values separated by commas and can be viewed similarly to an excel file. “Pandas” is the most essential data science package in Python. While analyzing the data, we must deal with large datasets, which are typically in CSV format. There are several approaches to using the CSV files to create a Pandas DataFrame. The technique that we chose to explain and implement in this article is the Pandas “read_csv()” method. To read and process the CSV files, Pandas’ “read csv()” method is essential.

We will see its practical demonstration through the example explained and executed in the following:

Example: Utilizing the Pandas “Read_Csv()” Method to Create a DataFrame from CSV

In this illustration, we will see how we can create a DataFrame from a CSV file by utilizing the Pandas “pd.read_csv()” method. Let’s get on with implementing this concept practically.

For every programming language that you select to work with according to the requirements, you need to find a software or a tool to assemble this language on. When you start searching for it, you will find a bunch of choices. In our article, the programming language utilized is “Python”. We need to get the tool or software that would assemble the language and is found compatible with our system. From a variety of choices, we selected the “Spyder” tool. We need to download it from the official website of “Spyder”.

When the downloading is complete, we launch the installation wizard. Once the installation is done, you can access the tool by simply writing its name on the laptop’s search bar. Clicking it opens the interface of the “Spyder” tool. Here, we are all set to start with our practical demonstration.

On the interface of the “Spyder” tool, click the “New file” button or press “Ctrl+N” to open a new file. This file is opened and you can see that the name of the file has a“.py” extension. This extension refers to the “Python” file. We are all set to start writing the code. Now, beginning with the code, the first and foremost requirement when writing a code is to import its relevant libraries whose features you want to access. In our case, the illustration is based on implementing the “Pandas” features. So, we first import the library using the code line “import pandas as pd”. This “pd” is a short form for Pandas which means that we can now load the Pandas methods using the “pd”.

Now, we are done importing the required Pandas library. The next task is to learn how we can create a DataFrame using the CSV file. Here, you have two choices: either you have to create your CSV file on Microsoft Excel or Google spreadsheets or any relevant tool with the “.csv” extension if you need to perform some operations on it in Python or you can download a sample CSV file from the internet for learning purposes.  We, on the other hand, downloaded a sample CSV file from the internet for the learning process. We invoked the “pd.read_csv()” method which reads the provided CSV file. Between its parentheses, provide the name of the CSV file.

As we mentioned, the “weekday.csv” file name.  One important thing to consider here is that the CSV file that you created or downloaded must be in the same folder where your “.py” files reside inside the “.spyder-py3” folder. Otherwise, when you try to execute the program, it will throw an error. When we call the “pd.read_csv(“weekday.csv”)” method, it  reads the content of this file and creates a DataFrame. Now, to store this DataFrame, we created a DataFrame object “sample” which holds the output generated from the “pd.read_csv()” method. Lastly, we invoked the “print()” method to display this DataFrame on the terminal.

Being novel to “Python” and “Spyder”, you might be thinking about how to execute the code that is previously scripted. You just need to press the “Run File” button on the “Spyder” interface or simply hit the “Shift+Enter” keys to run the program. Here is our DataFrame created from the provided CSV file.

In the given DataFrame, we have four columns and seven rows. The first column is “Name” which stores the names of the weekdays like “Monday”, “Tuesday”, “Wednesday”, “Thursday”, “Friday”, “Saturday”, and “Sunday”. The second column “Abbreviation” stores the short terms for the data like “Mon.”, “Tue.”, “Wed.”, “Thu.”, “Fri.”, “Sat”, and “Sun”. The third and the fourth columns are “Numeric” and “Numeric-2”. They store the numbers from “0” to “7”. They both are holding numeric values for the weekdays.

There might be a situation where you just want to create a DataFrame from the selected columns of the CSV file. This can be done using the same “pd.read_csv()” function by just adding a “usecol” parameter. This parameter takes the name of the columns that you want to retrieve from the CSV file for the DataFrame.  As we have already seen, the columns of our DataFrame which are imported from the CSV file are using the “Name” column and the “Numeric” column to be utilized from the CSV file to create a DataFrame.  Then, we invoked the “print()” method to display the selected columns in the DataFrame.

Running this code yields us the output DataFrame with only two columns from the CSV file. The DataFrame is shown in the following image:

Apart from creating a DataFrame using the selected columns from the provided CSV file, you can also do some other operations. There might be a CSV file containing a large data and not all of it is necessarily needed to be displayed for your DataFrame because a large unneeded data sometimes create a mess. So, we often try to avoid it. We can do so by skipping the irrelevant rows from the DataFrame. We need to add a “skiprows” parameter and specify the row numbers that you want to exclude. We specified the row numbers “[1, 3, 5]” here. The “print()” method is called to show the new DataFrame.

Here in the output image, you can observe that the DataFrame created from the CSV file does not contain the rows “1”, “3”, and “5”.

We can also change the name of the column of the CSV file according to our requirements for the DataFrame when we call the “pd.read_csv()” function. To complete this operation, we must pass a list of character strings to the “pd.read_csv()” function’s “names” parameter. These character strings serve as the names of the new columns. Additionally, it seems logical to exclude the first row of the input dataset because it contains the CSV file’s original title. We provided the names for the columns as “names=[‘C1’, ‘C2’, ‘C3’, ‘C4’]”. Finally, we displayed the DataFrame with new column names.

This gets us the following output DataFrame:

Conclusion

DataFrames are the most utilized and important blocks of the Python Pandas. There exist several ways to create a DataFrame in Pandas. Out of which, we discussed on how to create a DataFrame from a CSV file in this article. We used a Pandas “read_csv()” method to read the provided CSV file and then create a DataFrame from it.  Through the practical implementation of the example codes executed on “Spyder”, we elaborated on the utilization of this function. We also explained and implemented the different useful parameters provided by this method to achieve the desired outcome. We anticipate that our effort to make learning in Pandas modules easy will truly help you in your Python skill building.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.