Python|使用 pandas.read_csv() 读取 csv
Python是一种用于进行数据分析的出色语言,主要是因为以数据为中心的Python包的奇妙生态系统。 Pandas 就是其中之一,它使导入和分析数据变得更加容易。
导入熊猫:
import pandas as pd
代码 #1:read_csv是一个重要的 pandas函数,用于读取 csv 文件并对其进行操作。
PYTHON3
# Import pandas
import pandas as pd
# reading csv file
pd.read_csv("filename.csv")
PYTHON3
# importing Pandas library
import pandas as pd
pd.read_csv(filepath_or_buffer = "pokemon.csv")
# makes the passed rows header
pd.read_csv("pokemon.csv", header =[1, 2])
# make the passed column as index instead of 0, 1, 2, 3....
pd.read_csv("pokemon.csv", index_col ='Type')
# uses passed cols only for data frame
pd.read_csv("pokemon.csv", usecols =["Type"])
# returns pandas series if there is only one column
pd.read_csv("pokemon.csv", usecols =["Type"],
squeeze = True)
# skips the passed rows in new series
pd.read_csv("pokemon.csv",
skiprows = [1, 2, 3, 4])
通过它打开 CSV 文件很容易。但是通过这个函数可以做很多其他的事情,只是完全改变返回的对象。例如,不仅可以在本地读取 csv 文件,还可以通过 read_csv 从 URL 读取,或者可以选择需要导出的列,这样我们以后就不必编辑数组了。
这是它所采用的参数列表及其默认值。
pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
并非所有这些都很重要,但记住这些实际上可以节省自己执行相同功能的时间。通过在 jupyter notebook 中按 shift + tab 可以查看任何函数的参数。下面给出了有用的和它们的用法: Makes passed column as index instead of 0, 1, 2, 3…r Makes passed row/s[int/int list] as header Parameter Use filepath_or_buffer URL or Dir location of file sep Stands for separator, default is ‘, ‘ as in csv(comma separated values) index_col
header
use_cols Only uses the passed col[string list] to make data frame squeeze If true and only one column is passed, returns pandas series skiprows Skips passed rows in new data frame
请参阅此处使用的数据集的链接。
代码#2:
Python3
# importing Pandas library
import pandas as pd
pd.read_csv(filepath_or_buffer = "pokemon.csv")
# makes the passed rows header
pd.read_csv("pokemon.csv", header =[1, 2])
# make the passed column as index instead of 0, 1, 2, 3....
pd.read_csv("pokemon.csv", index_col ='Type')
# uses passed cols only for data frame
pd.read_csv("pokemon.csv", usecols =["Type"])
# returns pandas series if there is only one column
pd.read_csv("pokemon.csv", usecols =["Type"],
squeeze = True)
# skips the passed rows in new series
pd.read_csv("pokemon.csv",
skiprows = [1, 2, 3, 4])