如何在使用 Pandas 读取 csv 文件时跳过行？

由于以数据为中心的Python包的生态系统令人惊叹， Python是一种很好的数据分析语言。 Pandas 包就是其中之一，它使导入和分析数据变得更加容易。

在这里，我们将讨论如何在读取 csv 文件时跳过行。我们将使用 Pandas 库的read_csv()方法来完成此任务。

Syntax: pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)

编程需要懂一点英语

下面给出了一些有用的参数：

Parameter	Use
filepath_or_buffer	URL or Dir location of file
sep	Stands for separator, default is ‘, ‘ as in csv(comma separated values)
index_col	This parameter is use to make passed column as index instead of 0, 1, 2, 3…r
header	This parameter is use to make passed row/s[int/int list] as header
use_cols	This parameter is Only uses the passed col[string list] to make data frame
squeeze	If True and only one column is passed then returns pandas series
skiprows	This parameter is use to skip passed rows in new data frame
skipfooter	This parameter is use to skip Number of lines at bottom of file

如需下载 student.csv 文件，请单击此处

方法 1：读取 csv 文件时从头开始跳过 N 行。

代码：

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping 2 rows from start in csv
# and initialize it to a  dataframe
df = pd.read_csv("students.csv",
                  skiprows = 2)
 
# Show the dataframe
df

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping rows at specific position
df = pd.read_csv("students.csv",
                  skiprows = [0, 2, 5])
 
# Show the dataframe
df

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping 2 rows from start
# except the column names
df = pd.read_csv("students.csv",
                 skiprows = [i for i in range(1, 3) ])
 
# Show the dataframe
df

Python3

# Importing Pandas library
import pandas as pd
 
# function for checking and
# skipping every 3rd line
def logic(index):
 
    if index % 3 == 0:
        return True
 
    return False
 
# Skipping rows based on a condition
df = pd.read_csv("students.csv",
                 skiprows = lambda x: logic(x) )
 
# Show the dataframe
df

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping 2 rows from end
df = pd.read_csv("students.csv",
                  skipfooter = 5,
                  engine = 'python')
 
# Show the dataframe
df

输出：

csv文件内容

方法 2：在读取 csv 文件时跳过特定位置的行。

代码：

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping rows at specific position
df = pd.read_csv("students.csv",
                  skiprows = [0, 2, 5])
 
# Show the dataframe
df

输出：

csv 文件内容_6

方法 3：读取 csv 文件时，除了列名之外，从开头跳过 N 行。

代码：

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping 2 rows from start
# except the column names
df = pd.read_csv("students.csv",
                 skiprows = [i for i in range(1, 3) ])
 
# Show the dataframe
df

输出：

csv 文件内容_5

方法 4：在读取 csv 文件时根据条件跳过行。

代码：

Python3

# Importing Pandas library
import pandas as pd
 
# function for checking and
# skipping every 3rd line
def logic(index):
 
    if index % 3 == 0:
        return True
 
    return False
 
# Skipping rows based on a condition
df = pd.read_csv("students.csv",
                 skiprows = lambda x: logic(x) )
 
# Show the dataframe
df

输出：

csv 文件内容_4

方法 5：读取 csv 文件时从末尾跳过 N 行。

代码：

Python3

# Importing Pandas library
import pandas as pd
 
# Skipping 2 rows from end
df = pd.read_csv("students.csv",
                  skipfooter = 5,
                  engine = 'python')
 
# Show the dataframe
df

输出：

csv 文件内容_3