Python

Pandas Change Column Type

In the context of programming, a column is a group of data values that are all of the same types and are found in a table, series, DataFrame, etc. when discussed. Data is arranged vertically in a column from top to bottom. While using the Pandas DataFrame, the need may arise to change the datatype specific to all columns of an already created Pandas DataFrame. The purpose of this article is to explain how the datatype of the column can be changed in Pandas DataFrames. We will use the different functions to alter the datatype of one or multiple columns inside the Pandas DataFrame.

How to Change the Column’s Datatype in Pandas

Different functions in Pandas can be used to change the datatype of columns in a DataFrame. In the following examples, we will explain in detail how you can change the datatypes of columns in the DataFrame.

Example 1: Using the DataFrame.astype() Function

To create a sample DataFrame, we will first import the Pandas as pd to use the functionalities provided by it. To demonstrate the working of the astype() function, we create a DataFrame with three columns of different data types. Using the DataFrame.astype() method, we can convert a Pandas object to a required dtype. Additionally, this method enables the users to change any column (suitable) of the DataFrame to a category type.


As can be seen in the previous illustration, we created three columns: X, Y, and Z. Where the X column consists of int and string values [‘2’, 4, 6, ‘8’ and 10], the Y column has only string values [‘p’, ‘q’, ‘r’, ‘s’, ‘t’], and the Z column consists of float and string values. To view each column’s datatype in our DataFrame, the dtypes attribute is used. The “dtypes” attribute can be used to determine the data type in a Pandas DataFrame, a series comprising the data types of each column returned by the attribute.


As can be seen in the pevious illustration, the datatype of each column is shown as “object”. As you may notice, there is at least one string value in each column (X, Y, and Z) of the DataFrame. So, the dtype attribute considers each column’s datatype as an “object”. Now, let’s change the data type of the column X using the astype() function.


We assigned the int datatype to column X of our “df” DataFrame and assigned the new DataFrame to the variable named “df”. Now, we can check by using the dtypes attribute whether the data type of the column X is changed or not.


It can be seen that the data type of the column X is changed from object to “int32”. You can define a single data type for the entire DataFrame or separately to each column DataFrame using a Python dictionary. Let’s specify the different datatypes to each column of the DataFrame using a dictionary.


In the previous dictionary, we specified the datatype “int” to X column, datatype “string” to Y column, and datatype “float” to Z column of the “df” DataFrame. By using the dtypes attribute, let’s check the current datatypes of X, Y, and Z columns in our DataFrame.


The datatypes of each column are successfully changed. We can also use the astype() function to specify a single datatype to all columns of our DataFrame.


We used the astype() method to our DataFrame “df” and passed the “string” datatype as an argument to change the datatype of every column to “object”.

Example 2: Using the To_Numeric() Function

Apply() allows us to convert the data type of specific or all columns to int/float, DateTime, or time delta by passing the options “pandas.to_numeric,” “pandas.to_datetime,” and “pandas.to_timedelta”. Depending on the values present in the column, the to_numeric() function changes the datatype of a DataFrame column to an int or float. The column’s datatype is changed to “int64” if it contains integer numbers only. The datatype of the column is converted to “float64” by using the to_numeric() if it contains values with decimal points. To explain this with the help of an example, let’s create a DataFrame with string datatype but we use the integer values as “strings”.


We import the Pandas and numpy modules first. Then, we create the DataFrame using the pd.dataframe() function. Inside the pd.dataframe function, we pass the three lists: [‘1’, ‘2’, ‘3’], [‘4’, ‘5’, ‘6’], and [‘7’, ‘8’, ‘9’]. The names of the columns are specified as “x”, “y”, and “z”. The dtypes attribute is used to view the datatypes of x, y, and z columns.


Currently, the datatype of each row is “object”. We now use the to_numeric() function to change its datatype.


As mentioned previously, if the columns contain decimal numbers, the to_numeric() function automatically changes the datatype to “int64”. Let’s check whether the datatypes are changed or not.


As we used the decimal numbers as “string” values in the “df” DataFrame, the to_numeric() function successfully converted them to “int64”. What if there is a column with float values as “string” inside the “df” DataFrame? Will the to_numeric() function change its datatype to “float64”? To answer this question, we add another column with float values as “string” in the recently created “df” DataFrame.


We added another column to our “df” DataFrame and specified the column names as w, x, y, and z. The newly added z column consists of only decimal point values as “string”. Let’s apply the to_numeric() function to see the results.


The datatype of columns having numbers without decimal is converted to “int64” and the decimal number to “float64” by using the to_numeric() function.

Example 3: Using the Convert_Dtypes() Function

In the previous two examples, we changed the datatype of the DataFrame columns using the astype() and to_numeric() functions. We can also change the datatype of columns in our DataFrame by using the convert_dtypes() function. After assessing the data, the convert_dtypes() method returns a new DataFrame with each column’s datatype converted to the best suitable(optimized) datatype. To understand the convert_dtypes() function, let’s create a sample DataFrame first.


After importing the Pandas module, we created two lists – “name” and “student”. In the “name” list, we have some string and null values (“Tom”, pd.NA, “Max”, “Tony”). Whereas in the “student” list, we have the boolean values with some null values (True, pd.NA, False, pd.NA). We assigned both lists to the variable “data”. The pd.DataFrame() function is used to create a DataFrame inside which the data variable is passed as an argument. Then, we used the dtypes attribute which gave the datatypes of all the columns in our DataFrame. As seen in the previous illustration, the datatype of each column in our DataFrame is “object”. Let’s use the convert_dtypes() function to change the datatypes of columns with suitable datatypes.


As can be seen, the convert_dtypes() function changed the datatypes of columns with the best suitable(optimized) datatype. The datatype of the “name” column is changed to “string” and the datatype of the “student” columns is changed to Boolean.

Conclusion

In this tutorial, we have gone through different ways to change the datatype of the DataFrame’s column using different functions. We tried to make you capable of changing the datatype of the column of a DataFrame on your own using the astype(), to_numeric(), and convert_dtypes() functions. We implemented multiple examples to teach you how to use the DataFrame.astype() function, how to use the to_numeric() function, and how to use the convert_dtypes() function to change the column type.

About the author

Aqsa Yasin

I am a self-motivated information technology professional with a passion for writing. I am a technical writer and love to write for all Linux flavors and Windows.