📅  最后修改于: 2023-12-03 15:04:26.687000             🧑  作者: Mango
在数据分析和处理中,常常会遇到对缺失值进行填充或插值的需求。Python的Pandas库中提供了.interpolate()方法来实现数据的插值。
Dataframe.interpolate()方法用于对缺失值进行插值填充。该方法支持多种插值方式,包括线性插值、最近邻插值、多项式插值等。
DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None, downcast=None, **kwargs)
DataFrame对象,如果inplace参数为True,则返回None。
接下来,我们将通过几个简单的例子来演示如何使用DataFrame.interpolate()方法。
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [np.nan, 5, 6], 'C': [7, 8, np.nan]})
print(df)
df_interp = df.interpolate()
print(df_interp)
运行结果:
A B C
0 1.0 NaN 7.0
1 NaN 5.0 8.0
2 3.0 6.0 NaN
A B C
0 1.0 NaN 7.0
1 2.0 5.0 8.0
2 3.0 6.0 8.0
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, np.nan, 4, 5], 'B': [2, 3, 4, np.nan, 6], 'C': [7, np.nan, 9, 10, 11]})
print(df)
df_interp = df.interpolate(method='nearest')
print(df_interp)
运行结果:
A B C
0 1.0 2.0 7.0
1 NaN 3.0 NaN
2 NaN 4.0 9.0
3 4.0 NaN 10.0
4 5.0 6.0 11.0
A B C
0 1.0 2.0 7.0
1 1.0 3.0 7.0
2 4.0 4.0 9.0
3 4.0 4.0 10.0
4 5.0 6.0 11.0
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, 4, 5], 'B': [2, 4, 6, np.nan, 10], 'C': [7, 8, np.nan, 10, 11]})
print(df)
df_interp = df.interpolate(method='polynomial', order=2)
print(df_interp)
运行结果:
A B C
0 1.0 2.0 7.0
1 NaN 4.0 8.0
2 3.0 6.0 NaN
3 4.0 NaN 10.0
4 5.0 10.0 11.0
A B C
0 1.000000 2.000000 7.000000
1 2.333333 4.000000 8.000000
2 3.000000 6.000000 9.346154
3 4.000000 9.076923 10.000000
4 5.000000 10.000000 11.000000