Python Pandas

Pandas to Numeric

In this guide, we will explore how to convert argument to numeric type in pandas Series and DataFrame using the pandas.to_numeric() function. Also, we will handle the result when string type data is passed to this function.

pandas.to_numeric()

The pandas.to_numeric() function converts the specified argument to a numeric type. It supports only one-dimensional data like Series, List etc. On the pandas DataFrame, only one column is passed to this function.

The default return data type of this function is float64 or int64, depending on the data supplied to the function. There are different parameters that are passed to this function such that the casting can be handled during the conversion itself.

Syntax

Let’s see the syntax of the pandas.to_numeric() function with parameters in detail.

pandas.to_numeric(arg, errors, downcast)
  1. arg – it refers to the input one-dimensional data to be converted into numeric type. We can pass DataFrame columns, Series, List, Tuple, Array etc.
  2. errors – if the arg holds string-type elements, automatically this function raise – ValueError exception. We can control the flow of the code without raising an error.
    1. If errors = ‘raise’ (the default), then invalid parsing will raise an exception.
    2. If errors = ‘coerce,’ then invalid elements will be set as NaN.
    3. If errors = ‘ignore,’ then invalid elements will return as existing.
  3. downcast – we can change the datatype of the argument while converting it to numeric.
    1. If downcast is ‘integer’ or ‘signed,’ then the data type is cast into the smallest signed int – int8.
    2. If downcast is ‘unsigned,’ then the data type is cast into the smallest unsigned int – uint8.
    3. If downcast is ‘float,’ then the data type is cast into the smallest float type – float32.
    4. If downcast is ‘None,’ the actual argument data type is considered.

1. downcast parameter

Let’s specify the downcast parameter and cast the input argument into different types.

Example 1

Let’s create a Series called Data_points with five float values and convert it to numeric with the downcast parameter set to None.

import pandas

# Create Series with 5 values
Data_points=pandas.Series([56.90,78.94,90.00,52.11,33.67])
print(Data_points,"\n")
print(pandas.to_numeric(Data_points,downcast=None))

Output

Existing data type is float64. After converting the Series to numeric, the data type is not changed.

Example 2

Let’s create a Series named Data_points with five float values and convert it into numeric with the downcast parameter set to None.

import pandas

# Create Series with 5 values
Data_points=pandas.Series([56.90,78.94,90.00,52.11,33.67])
print(Data_points,"\n")

# Cast the Series - Data_points to float32
print(pandas.to_numeric(Data_points,downcast='float'))

Output

The existing data type is float64. After converting the Series to numeric with a downcast to float, the data type is cast into float32.

Example 3

  1. Cast the Series – Data_points to integer by converting them as numeric.
  2. Cast the Series – Data_points to the signed integer by converting them as numeric.
import pandas

# Create Series with 5 values
Data_points=pandas.Series([56,89,88,54,33])
print(Data_points,"\n")

# Cast the Series - Data_points to integer
print(pandas.to_numeric(Data_points,downcast='integer'),"\n")

# Cast the Series - Data_points to signed
print(pandas.to_numeric(Data_points,downcast='signed'),"\n")

Output

The existing data type is int64. After converting the Series to numeric with downcast set to ‘integer’ and ‘signed,’ the Series is converted to int8.

Example 4

Utilize the same Series data and cast into unsigned integers by converting it into numeric.

import pandas

# Create Series with 5 values
Data_points=pandas.Series([56,89,88,54,33])

# Cast the Series - Data_points to unsigned
print(pandas.to_numeric(Data_points,downcast='unsigned'),"\n")

Output

The Series is converted to numeric with uint8.

2. errors parameter

Now we will see how to handle the errors while converting the Series into numeric.

Example 1

Create a Series with five elements. Among five, two are strings. Set the errors parameter to “ignore” while converting the Series to numeric.

import pandas

# Create Series with 5 elements
Data=pandas.Series(["Linuxhint",79,80.6,100,"Java"])
print(Data,"\n")

# errors = 'ignore'
print(pandas.to_numeric(Data,errors = 'ignore'))

Output

We can see that both the strings remained the same in the result.

Example 2

Set the errors parameter to “coerce” while converting the Series to numeric.

import pandas

# Create Series with 5 elements
Data=pandas.Series(["Linuxhint",79,80.6,100,"Java"])
print(Data,"\n")

# errors = 'coerce'
print(pandas.to_numeric(Data,errors = 'coerce'))

Output

We can see that both the strings are converted to NaN and remaining values are converted to float64.

Example 3

Set the errors parameter to “coerce” while converting the Series to numeric.

import pandas

# Create Series with 5 elements
Data=pandas.Series(["Linuxhint",79,80.6,100,"Java"])
print(Data,"\n")

# errors = 'raise'
print(pandas.to_numeric(Data,errors = 'raise'))

Error Output

We can see that ValueError is raised while converting to numeric.

2. Bonus Example – DataFrame

Let’s see how to convert a DataFrame column to numeric using the to_numeric() function.

import pandas

# Create DataFrame with one column
data_values = pandas.DataFrame([79,80,100], columns=['Values'])
print(data_values,"\n")

# Convert Values column into unsigned-numeric
print(pandas.to_numeric(data_values['Values'],downcast = 'unsigned'),"\n")

# Convert Values column into signed-numeric
print(pandas.to_numeric(data_values['Values'],downcast = 'signed'))

Output

Conclusion

The pandas.to_numeric() function converts the specified argument to the numeric type. It supports only one-dimensional data like Series, Lists, etc. On the pandas DataFrame, only one column is passed to this function. We discussed how to handle the invalid parsing by specifying the errors parameter.

About the author

Gottumukkala Sravan Kumar

B tech-hon's in Information Technology; Known programming languages - Python, R , PHP MySQL; Published 500+ articles on computer science domain