如何使用Python中的 Pandas 按特定列合并两个 csv 文件?
在本文中,我们将讨论如何合并两个 CSV 文件,pandas 库pandas.merge()中有一个函数。合并只是基于公共属性或列将两个数据集合并为一个。
Syntax: pandas.merge()
Parameters :
- data1, data2: Dataframes used for merging.
- how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’
- on: label or list
Returns : A DataFrame of the two merged objects.
有 4 种类型的合并。
- 内
- 剩下
- 对
- 外
我们将使用以下两个csv文件,即loan.csv和borrower.csv来执行所有操作:
内部联接
通过设置how='inner ' 它将基于指定的列合并两个数据帧,然后返回仅包含在两个原始数据帧中具有匹配值的那些行的新数据帧。
代码:
Python3
import pandas as pd
# reading two csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='inner'
output1 = pd.merge(data1, data2,
on='LOAN_NO',
how='inner')
# displaying result
print(output1)
Python3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='left'
output2 = pd.merge(data1, data2,
on='LOAN_NO',
how='left')
# displaying result
print(output2)
Python3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='right'
output3 = pd.merge(data1, data2,
on='LOAN_NO',
how='right')
# displaying result
print(output3)
Python3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='outer'
output4 = pd.merge(data1, data2,
on='LOAN_NO',
how='outer')
# displaying result
print(output4)
输出:
左外连接
通过设置how='left'它将基于指定的列合并两个数据帧,然后返回包含来自左侧数据帧的所有行的新数据帧,包括那些在右侧数据帧中也没有值的行,并将右侧数据帧列值设置为 NAN。
代码:
蟒蛇3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='left'
output2 = pd.merge(data1, data2,
on='LOAN_NO',
how='left')
# displaying result
print(output2)
输出:
右外连接
通过设置 how='right' 它将基于指定的列合并两个数据帧,然后返回包含来自右侧数据帧的所有行的新数据帧,包括那些在左侧数据帧中也没有值的行,并将左侧数据帧列值设置为 NAN。
代码:
蟒蛇3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='right'
output3 = pd.merge(data1, data2,
on='LOAN_NO',
how='right')
# displaying result
print(output3)
输出:
全外连接
通过设置how='right'它将基于指定的列合并两个数据帧,然后返回包含来自两个数据帧的行的新数据帧,并为其中一个数据帧中缺少数据的那些设置 NAN 值。
代码:
蟒蛇3
import pandas as pd
# reading csv files
data1 = pd.read_csv('datasets/loan.csv')
data2 = pd.read_csv('datasets/borrower.csv')
# using merge function by setting how='outer'
output4 = pd.merge(data1, data2,
on='LOAN_NO',
how='outer')
# displaying result
print(output4)
输出: