Python|熊猫 dataframe.interpolate()

Python是用于进行数据分析的出色语言，主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas就是其中之一，它使导入和分析数据变得更加容易。

Pandas dataframe.interpolate()函数主要用于填充数据框或系列中的NA值。但是，这是一个非常强大的函数来填补缺失值。它使用各种插值技术来填充缺失值，而不是对值进行硬编码。

Syntax: DataFrame.interpolate(method=’linear’, axis=0, limit=None, inplace=False, limit_direction=’forward’, limit_area=None, downcast=None, **kwargs)

Parameters :
method : {‘linear’, ‘time’, ‘index’, ‘values’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘barycentric’, ‘krogh’, ‘polynomial’, ‘spline’, ‘piecewise_polynomial’, ‘from_derivatives’, ‘pchip’, ‘akima’}

axis : 0 fill column-by-column and 1 fill row-by-row.
limit : Maximum number of consecutive NaNs to fill. Must be greater than 0.
limit_direction : {‘forward’, ‘backward’, ‘both’}, default ‘forward’
limit_area : None (default) no fill restriction. inside Only fill NaNs surrounded by valid values (interpolate). outside Only fill NaNs outside valid values (extrapolate). If limit is specified, consecutive NaNs will be filled in this direction.
inplace : Update the NDFrame in place if possible.
downcast : Downcast dtypes if possible.
kwargs : keyword arguments to pass on to the interpolating function.

Returns : Series or DataFrame of same shape interpolated at the NaNs

编程需要懂一点英语

示例 #1：使用interpolate()函数使用线性方法填充缺失值。

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[None, 2, 54, 3, None],
                   "C":[20, 16, None, 3, 8],
                   "D":[14, 3, None, None, 6]})
  
# Print the dataframe
df

让我们使用线性方法对缺失值进行插值。请注意，线性方法忽略索引并将值视为等距。

# to interpolate the missing values
df.interpolate(method ='linear', limit_direction ='forward')

输出：

正如我们所看到的输出，第一行中的值无法被填充，因为值的填充方向是forward的，并且没有可以用于插值的先前值。

示例 #2：使用interpolate()函数使用线性方法反向插入缺失值，并限制可以填充的连续Na值的最大数量。

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[None, 2, 54, 3, None],
                   "C":[20, 16, None, 3, 8],
                   "D":[14, 3, None, None, 6]})
  
# to interpolate the missing values
df.interpolate(method ='linear', limit_direction ='backward', limit = 1)

输出：

请注意第四列，因为我们将限制设置为 1，所以只填充了一个缺失值。最后一行中的缺失值无法填充，因为在此之后不存在可以插值的行。