如何在 Pandas 中使用自定义分隔符将 CSV 文件读取到 Dataframe?
由于以数据为中心的Python包的生态系统令人惊叹, Python是一种很好的数据分析语言。 pandas 包就是其中之一,它使导入和分析数据变得更加容易。
在这里,我们将讨论如何将 csv 文件加载到 Dataframe 中。它是使用pandas.read_csv() 方法完成的。我们必须导入pandas库才能使用此方法。
Syntax: pd.read_csv(filepath_or_buffer, sep=’, ‘, delimiter=None, header=’infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=’infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar='”‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)
下面给出了一些有用的参数:
Parameter | Use |
---|---|
filepath_or_buffer | URL or Dir location of file |
sep | Stands for separator, default is ‘, ‘ as in csv(comma separated values) |
index_col | This parameter is use to make passed column as index instead of 0, 1, 2, 3…r |
header | This parameter is use to make passed row/s[int/int list] as header |
use_cols | This parameter is Only uses the passed col[string list] to make data frame |
squeeze | If True and only one column is passed then returns pandas series |
skiprows | This parameter is use to skip passed rows in new data frame |
skipfooter | This parameter is use to skip Number of lines at bottom of file |
此方法使用逗号 ', ' 作为默认分隔符,但我们也可以使用自定义分隔符或正则表达式作为分隔符。
如需下载 csv 文件,请单击此处
示例 1:使用带有默认分隔符的 read_csv() 方法,即 comma(, )
Python3
# Importing pandas library
import pandas as pd
# Using the function to load
# the data of example.csv
# into a Dataframe df
df = pd.read_csv('example1.csv')
# Print the Dataframe
df
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with '_' as custom delimiter
# into a Dataframe df
df = pd.read_csv('example2.csv',
sep = '_',
engine = 'python')
# Print the Dataframe
df
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with tab as custom delimiter
# into a Dataframe df
df = pd.read_csv('example3.csv',
sep = '\t',
engine = 'python')
# Print the Dataframe
df
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with regular expression as
# custom delimiter into a
# Dataframe df
df = pd.read_csv('example4.csv',
sep = '[:, |_]',
engine = 'python')
# Print the Dataframe
df
输出:
示例 2:使用带有 '_' 作为自定义分隔符的 read_csv() 方法。
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with '_' as custom delimiter
# into a Dataframe df
df = pd.read_csv('example2.csv',
sep = '_',
engine = 'python')
# Print the Dataframe
df
输出:
注意:在提供自定义说明符时,我们必须指定 engine=' Python' 否则我们可能会收到如下所示的警告:
示例 3:使用带有制表符的 read_csv() 方法作为自定义分隔符。
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with tab as custom delimiter
# into a Dataframe df
df = pd.read_csv('example3.csv',
sep = '\t',
engine = 'python')
# Print the Dataframe
df
输出:
示例 4:使用带有正则表达式的 read_csv() 方法作为自定义分隔符。
假设我们有一个带有多种分隔符的 csv 文件,如下所示。
totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4
要将此类文件加载到数据框中,我们使用正则表达式作为分隔符。
Python3
# Importing pandas library
import pandas as pd
# Load the data of example.csv
# with regular expression as
# custom delimiter into a
# Dataframe df
df = pd.read_csv('example4.csv',
sep = '[:, |_]',
engine = 'python')
# Print the Dataframe
df
输出: