This guide will present a guide on Pandas interpolation using the below content:
- What is the “DataFrame.interpolate()” Method in Python?
- Fill the Missing Value
- Fill the Missing Value in Backward Direction
- Fill the Max Number of Missing Value
- Fill the Missing Value by Specify the Area to be Interpolated
What is the “DataFrame.interpolate()” Method in Python?
In Python, the “DataFrame.interpolate()” method is utilized to fill the missing or Nan values in a Series or DataFrame. This method replaced the Null or Nan values based on the specified methods.
Syntax
Parameters
In the above syntax:
- The “method” parameter specifies the interpolation approach to use while filling the missing value. Some of the values include “linear”, “pad”, “zero”, “cubic”, “polynomial” and others. Here, each value has a different meaning and effect on how the Nan values are filled.
- The “axis” parameter is the axis to interpolate along. It can be 0 for index and 1 for columns.
- The “limit” parameter specifies the highest number of successive Nans to fill.
- The “inplace” parameter is the “True” or “False” value that specifies whether to update the data in place if possible.
- The “limit_direction” parameter is the direction in which to fill consecutive Nans if a limit is specified.
- The “limit_area” parameter is the area in which to fill consecutive Nans if a limit is specified.
- Lastly, the “downcast” parameter is an optional argument that specifies whether to downcast dtypes if possible.
For further understanding, you can overview this official documentation.
Return Value
The “DataFrame.interpolate()” method retrieves the DataFrame or Series or None of the same shapes interpolated at the NaNs
Example 1: Using “DataFrame.interpolate()” Method to Fill the Missing Value
In the below code, we first imported and created the DataFrame with None values in the columns. Next, the “df.interpolate()” method is used to fill the Nan values with the number between the previous and next row by ignoring the index. The row containing no value in the first row cannot get filled because the filling value direction is forward and there is no previous value.
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
'Score_1': [12, 32, None ,None, 45, None],
'Score_2': [None, 23, 33, None, 45, 55],
'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear')
print(df1)
The interpolation of the DataFrame based on the Linear default method is shown below:
Example 2: Using “DataFrame.interpolate()” Method to Fill the Missing Value in Backward Direction
We can also find the interpolation in a backward direction just like we do for the forward direction in the previous example. The “limit_direction=” parameter with the “backward” value is passed to the “DataFrame.interpolate()” method. In the backward direction limit the missing value in the end row cannot get filled as no row is present after that from which the value can be interpolated:
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
'Score_1': [12, 32, None ,None, 45, None],
'Score_2': [None, 23, 33, None, 45, 55],
'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear',limit_direction ='backward')
print(df1)
The backward Pandas interpolation on the DataFrame shown below output:
Example 3: Using “DataFrame.interpolate()” Method to Fill the Max Number of Missing Values
We can also specify the maximum number of consecutive missing values to interpolate. If this value is not set then by default all consecutive Nan values will be interpolated. Here in this code, the “df.interpolate()” method takes the “limit=1” parameter as an argument and fills only one consecutive missing value for each column of DataFrame:
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
'Score_1': [12, 32, None ,None, 45, None],
'Score_2': [None, 23, 33, None, 45, 55],
'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear', limit=1)
print(df1)
Output
Example 4: Using “DataFrame.interpolate()” Method to Fill the Missing Value by Specify the Area to be Interpolated
In this code, we specify the area of interpolation using the “limit_area=” parameter. The “DataFrame.interpolate()” method takes the limit_area parameter value “inside” to fill only the missing values that are surrounded by existing values in the same column. The “limit_area=outside” is passed to the method to fill only the missing values that are not surrounded by existing values in the same column:
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
'Score_1': [12, 32, None ,None, 45, None],
'Score_2': [None, 23, 33, None, 45, 55],
'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(limit_area='inside')
print(df1, '\n')
df2 = df.interpolate(limit_area='outside')
print(df2)
The above-code execution will retrieve the below output:
Conclusion
The “DataFrame.interpolate()” method is utilized in Python to fill the DataFrame/Series missing value or Nan values based on the specified method. We can use this method to fill the missing value in a forward or backward direction using the “limit_direction” parameter. We can also limit the maximum number of straight Nan values to be filled while interpolation using the “limit” parameter. This write-up covered a detailed guide on Panda’s interpolation via numerous examples.