如何在 Pandas 中使用自定义分隔符将 CSV 文件读取到 Dataframe？

由于以数据为中心的Python包的生态系统令人惊叹， Python是一种很好的数据分析语言。 pandas 包就是其中之一，它使导入和分析数据变得更加容易。
在这里，我们将讨论如何将 csv 文件加载到 Dataframe 中。它是使用pandas.read_csv() 方法完成的。我们必须导入pandas库才能使用此方法。

Syntax: pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)

编程需要懂一点英语

下面给出了一些有用的参数：

Parameter	Use
filepath_or_buffer	URL or Dir location of file
sep	Stands for separator, default is ‘, ‘ as in csv(comma separated values)
index_col	This parameter is use to make passed column as index instead of 0, 1, 2, 3…r
header	This parameter is use to make passed row/s[int/int list] as header
use_cols	This parameter is Only uses the passed col[string list] to make data frame
squeeze	If True and only one column is passed then returns pandas series
skiprows	This parameter is use to skip passed rows in new data frame
skipfooter	This parameter is use to skip Number of lines at bottom of file

此方法使用逗号 ', ' 作为默认分隔符，但我们也可以使用自定义分隔符或正则表达式作为分隔符。
如需下载 csv 文件，请单击此处
示例 1：使用带有默认分隔符的 read_csv() 方法，即 comma(, )

Python3

# Importing pandas library
import pandas as pd
 
# Using the function to load
# the data of example.csv
# into a Dataframe df
df = pd.read_csv('example1.csv')
 
# Print the Dataframe
df

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with '_' as custom delimiter
# into a Dataframe df
df = pd.read_csv('example2.csv',
                   sep = '_',
                   engine = 'python')
 
# Print the Dataframe
df

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with tab as custom delimiter
# into a Dataframe df
df = pd.read_csv('example3.csv',
                   sep = '\t',
                   engine = 'python')
 
# Print the Dataframe
df

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with regular expression as
# custom delimiter into a
# Dataframe df
df = pd.read_csv('example4.csv',
                   sep = '[:, |_]',
                   engine = 'python')
 
# Print the Dataframe
df

输出：

带有逗号的csv文件

示例 2：使用带有 '_' 作为自定义分隔符的 read_csv() 方法。

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with '_' as custom delimiter
# into a Dataframe df
df = pd.read_csv('example2.csv',
                   sep = '_',
                   engine = 'python')
 
# Print the Dataframe
df

输出：

带有下划线的csv文件

注意：在提供自定义说明符时，我们必须指定 engine=' Python' 否则我们可能会收到如下所示的警告：

熊猫引擎警告

示例 3：使用带有制表符的 read_csv() 方法作为自定义分隔符。

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with tab as custom delimiter
# into a Dataframe df
df = pd.read_csv('example3.csv',
                   sep = '\t',
                   engine = 'python')
 
# Print the Dataframe
df

输出：

带有下划线的csv文件

示例 4：使用带有正则表达式的 read_csv() 方法作为自定义分隔符。
假设我们有一个带有多种分隔符的 csv 文件，如下所示。

totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4

编程需要懂一点英语

要将此类文件加载到数据框中，我们使用正则表达式作为分隔符。

Python3

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with regular expression as
# custom delimiter into a
# Dataframe df
df = pd.read_csv('example4.csv',
                   sep = '[:, |_]',
                   engine = 'python')
 
# Print the Dataframe
df

输出：

带有正则表达式的csv文件