如何在使用 Pandas 读取 csv 文件时跳过行?
由于以数据为中心的Python包的生态系统令人惊叹, Python是一种很好的数据分析语言。 Pandas 包就是其中之一,它使导入和分析数据变得更加容易。
在这里,我们将讨论如何在读取 csv 文件时跳过行。我们将使用 Pandas 库的read_csv()方法来完成此任务。
Syntax: pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
下面给出了一些有用的参数:Parameter Use filepath_or_buffer URL or Dir location of file sep Stands for separator, default is ‘, ‘ as in csv(comma separated values) index_col This parameter is use to make passed column as index instead of 0, 1, 2, 3…r header This parameter is use to make passed row/s[int/int list] as header use_cols This parameter is Only uses the passed col[string list] to make data frame squeeze If True and only one column is passed then returns pandas series skiprows This parameter is use to skip passed rows in new data frame skipfooter This parameter is use to skip Number of lines at bottom of file
如需下载 student.csv 文件,请单击此处
方法 1:读取 csv 文件时从头开始跳过 N 行。
代码:
Python3
# Importing Pandas library
import pandas as pd
# Skipping 2 rows from start in csv
# and initialize it to a dataframe
df = pd.read_csv("students.csv",
skiprows = 2)
# Show the dataframe
df
Python3
# Importing Pandas library
import pandas as pd
# Skipping rows at specific position
df = pd.read_csv("students.csv",
skiprows = [0, 2, 5])
# Show the dataframe
df
Python3
# Importing Pandas library
import pandas as pd
# Skipping 2 rows from start
# except the column names
df = pd.read_csv("students.csv",
skiprows = [i for i in range(1, 3) ])
# Show the dataframe
df
Python3
# Importing Pandas library
import pandas as pd
# function for checking and
# skipping every 3rd line
def logic(index):
if index % 3 == 0:
return True
return False
# Skipping rows based on a condition
df = pd.read_csv("students.csv",
skiprows = lambda x: logic(x) )
# Show the dataframe
df
Python3
# Importing Pandas library
import pandas as pd
# Skipping 2 rows from end
df = pd.read_csv("students.csv",
skipfooter = 5,
engine = 'python')
# Show the dataframe
df
输出 :
方法 2:在读取 csv 文件时跳过特定位置的行。
代码:
Python3
# Importing Pandas library
import pandas as pd
# Skipping rows at specific position
df = pd.read_csv("students.csv",
skiprows = [0, 2, 5])
# Show the dataframe
df
输出 :
方法 3:读取 csv 文件时,除了列名之外,从开头跳过 N 行。
代码:
Python3
# Importing Pandas library
import pandas as pd
# Skipping 2 rows from start
# except the column names
df = pd.read_csv("students.csv",
skiprows = [i for i in range(1, 3) ])
# Show the dataframe
df
输出 :
方法 4:在读取 csv 文件时根据条件跳过行。
代码:
Python3
# Importing Pandas library
import pandas as pd
# function for checking and
# skipping every 3rd line
def logic(index):
if index % 3 == 0:
return True
return False
# Skipping rows based on a condition
df = pd.read_csv("students.csv",
skiprows = lambda x: logic(x) )
# Show the dataframe
df
输出 :
方法 5:读取 csv 文件时从末尾跳过 N 行。
代码:
Python3
# Importing Pandas library
import pandas as pd
# Skipping 2 rows from end
df = pd.read_csv("students.csv",
skipfooter = 5,
engine = 'python')
# Show the dataframe
df
输出 :