📅  最后修改于: 2023-12-03 14:45:02.794000             🧑  作者: Mango
本教程将介绍pandas python库的基础知识、使用方法以及适用场景。
pandas是一个基于NumPy的开源数据分析库,它可用于数据挖掘、数据分析、数据清洗、数据可视化等领域。
可以使用pip来安装pandas:
pip install pandas
import pandas as pd
可以使用以下语句创建一个Series:
s = pd.Series([1,3,5,np.nan,6,8])
print(s)
Output:
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
可以使用以下语句创建一个DataFrame:
dates = pd.date_range('20160101', periods=6)
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
print(df)
Output:
A B C D
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661
2016-01-02 0.157541 -1.094029 -1.162399 1.365767
2016-01-03 0.073426 2.537908 -1.328294 -0.237003
2016-01-04 -1.583608 -0.538704 -0.882628 -0.154218
2016-01-05 -1.048464 1.697358 1.008330 0.859932
2016-01-06 1.527778 1.178987 -0.106862 -0.400041
print(df.head())
Output:
A B C D
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661
2016-01-02 0.157541 -1.094029 -1.162399 1.365767
2016-01-03 0.073426 2.537908 -1.328294 -0.237003
2016-01-04 -1.583608 -0.538704 -0.882628 -0.154218
2016-01-05 -1.048464 1.697358 1.008330 0.859932
print(df.tail())
Output:
A B C D
2016-01-02 0.157541 -1.094029 -1.162399 1.365767
2016-01-03 0.073426 2.537908 -1.328294 -0.237003
2016-01-04 -1.583608 -0.538704 -0.882628 -0.154218
2016-01-05 -1.048464 1.697358 1.008330 0.859932
2016-01-06 1.527778 1.178987 -0.106862 -0.400041
print(df.index)
print(df.columns)
Output:
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', freq='D')
Index(['A', 'B', 'C', 'D'], dtype='object')
print(df.dtypes)
Output:
A float64
B float64
C float64
D float64
dtype: object
print(df['A'])
Output:
2016-01-01 -0.325459
2016-01-02 0.157541
2016-01-03 0.073426
2016-01-04 -1.583608
2016-01-05 -1.048464
2016-01-06 1.527778
Freq: D, Name: A, dtype: float64
print(df[0:3])
Output:
A B C D
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661
2016-01-02 0.157541 -1.094029 -1.162399 1.365767
2016-01-03 0.073426 2.537908 -1.328294 -0.237003
print(df.loc['20160102':'20160104',['A','B']])
Output:
A B
2016-01-02 0.157541 -1.094029
2016-01-03 0.073426 2.537908
2016-01-04 -1.583608 -0.538704
print(df.iloc[3,1])
Output:
-0.538704336731
df1 = df.reindex(index=dates[0:4], columns=list(df.columns) + ['E'])
df1.loc[dates[0]:dates[1],'E'] = 1
print(df1)
print(df1.dropna())
print(df1.fillna(value=2))
Output:
A B C D E
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661 1.0
2016-01-02 0.157541 -1.094029 -1.162399 1.365767 1.0
2016-01-03 0.073426 2.537908 -1.328294 -0.237003 NaN
2016-01-04 -1.583608 -0.538704 -0.882628 -0.154218 NaN
A B C D E
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661 1.0
2016-01-02 0.157541 -1.094029 -1.162399 1.365767 1.0
A B C D E
2016-01-01 -0.325459 -1.255455 1.756833 -0.985661 1.0
2016-01-02 0.157541 -1.094029 -1.162399 1.365767 1.0
2016-01-03 0.073426 2.537908 -1.328294 -0.237003 2.0
2016-01-04 -1.583608 -0.538704 -0.882628 -0.154218 2.0
以上是pandas python库的基础知识、使用方法以及适用场景。更多详细内容可以参考pandas官方文档。