Pandas is one of Python’s most valuable data analysis and manipulation packages.
It offers features such as custom data structures that are built on top of Python.
This article will discuss converting a column from one data type to an int type within a Pandas DataFrame.
Setting Up Pandas
Before diving into how to perform the conversion operation, we need to setup Pandas in our Python environment.
If you are using the base environment in the Anaconda interpreter, chances are you have Pandas installed.
However, on a native Python install, you will need to install it manually.
You can do that by running the command:
On Linux, run
In Anaconda or Miniconda environments, install pandas with conda.
$ sudo conda install pandas
Pandas Create Sample DataFrame
Let us set up a sample DataFrame for illustration purposes in this tutorial. You can copy the code below or use your DataFrame.
df = pd.DataFrame({'id': ['1', '2', '3', '4', '5'],
'name': ['Marja Jérôme', 'Alexios Shiva', 'Mohan Famke', 'Lovrenco Ilar', 'Steffen Angus'],
'points': ['50000', '70899', '70000', '81000', '110000']})
Once the DataFrame is created, we can check the data.
Pandas Show Column Type
It is good to know if the existing type can be cast to an int before converting a column from one type to an int.
For example, attempting to convert a column containing names cannot be converted to an int.
We can view the type of a DataFrame using the dtypes property
Use the syntax:
In our sample DataFrame, we can get the column types as:
id object
name object
points object
dtype: object
We can see from the output above that none of the columns hold an int type.
Pandas Convert Column From String to Int.
To convert a single column to an int, we use the astype() function and pass the target data type as the parameter.
The function syntax:
- dtype – specifies the Python type or a NumPy dtype to which the object is converted.
- copy – allows you to return a copy of the object instead of acting in place.
- errors – specifies the action in case of error. By default, the function will raise the errors.
In our sample DataFrame, we can convert the id column to int type using the astype() function as shown in the code below:
The code above specifies the ‘id’ column as the target object. We then pass an int as the type to the astype() function.
We can check the new data type for each column in the DataFrame:
id int32
name object
points object
dtype: object
The id column has been converted to an int while the rest remains unchanged.
Pandas Convert Multiple Columns to Int
The astype() function allows us to convert more than one column and convert them to a specific type.
For example, we can run the following code to convert the id and points columns to int type.
Here, we are specifying multiple columns using the square bracket notation. This allows us to convert the columns to the data type specified in the astype() function.
If we check the column type, we should see an output:
id int32
name object
points int32
dtype: object
We can now see that the id and points column has been converted to int32 type.
Pandas Convert Multiple Columns to Multiple Types
The astype() function allows us to specify a column and target type as a dictionary.
Assume that we want to convert the id column to int32 and the points column to float64.
We can run the following code:
df = df.astype(convert_to)
In the code above, we start by defining a dictionary holding the target column as the key and the target type as the value.
We then use the astype() function to convert the columns in the dictionary to the set types.
Checking the column types should return:
id int32
name object
points float64
dtype: object
Note that the id column is int32 and the points column is of float32 type.
Pandas Convert Column to Int – to_numeric()
Pandas also provides us with the to_numeric() function. This function allows us to convert a column to a numeric type.
The function syntax is as shown:
For example, to convert the id column to numeric in our sample DataFrame, we can run:
The code should take the id column and convert it into an int type.
Pandas Convert DataFrame to Best Possible Data Type
The convert_dtypes() function in Pandas allows us to convert an entire DataFrame to the nearest possible type.
The function syntax is as shown:
convert_integer=True, convert_boolean=True, convert_floating=True)
You can check the docs in the resource below:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.convert_dtypes.html
For example, to convert our sample DataFrame to the nearest possible type, we can run:
If we check the type:
id Int32
name string
points Int64
dtype: object
You will notice that each column has been converted to the nearest appropriate type. For example, the function converts small ints to int32 type.
Likewise, the names column is converted to string type as it holds string values.
Finally, since the points column holds larger integers, it is converted to an int64 type.
Conclusion
In this article, we gave detailed methods and examples of converting a Pandas DataFrame from one type to another.