根据最近的 DateTime 合并两个 Pandas DataFrame
在本文中,我们将讨论如何根据最近的 DateTime 合并 Pandas DataFrame。要首先了解如何合并 DataFrame,您必须了解如何为其创建 DataFrame,您必须参考文章创建 Pandas DataFrame 。创建 DataFrames 后需要合并它们并合并 Dataframe 有一个名为merge_asof()的函数在编写时可以写为:
pandas.merge_asof(left, right, on=None, left_on=None, right_on=None, left_index=False, right_index=False, by=None, left_by=None, right_by=None, suffixes=(‘_x’, ‘_y’), tolerance=None, allow_exact_matches=True, direction=’backward’)
笔记:
- 要了解有关此函数的更多信息,请参阅文章 Python的pandas.merge_asof()函数
- 数据帧必须按键排序。
循序渐进的方法
第一步:导入pandas库
要完成此任务,我们必须导入名为 Pandas 的库。
import pandas as pd
第 2 步:创建数据框
在这一步中,我们必须使用函数“pd.DataFrame()”创建数据帧。在此,我们创建了 2 个数据帧,一个名为 left,另一个名为 right,因为我们的最后目标是基于最近的 DataTime 合并 2 个数据帧。可以写成:
left = pd.DataFrame( {
“time”: [pd.Timestamp(“2020-03-25 13:30:00.023”),
pd.Timestamp(“2020-03-25 13:30:00.023”),
pd.Timestamp(“2020-03-25 13:30:00.030”),
pd.Timestamp(“2020-03-25 13:30:00.041”),
pd.Timestamp(“2020-03-25 13:30:00.048”),
pd.Timestamp(“2020-03-25 13:30:00.049”),
pd.Timestamp(“2020-03-25 13:30:00.072”),
pd.Timestamp(“2020-03-25 13:30:00.075”)
],
“ticker”: [“GOOG”,”MSFT”,”MSFT”,”MSFT”,”GOOG”,”AAPL”,”GOOG”,”MSFT”],
“bid”: [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01],
“ask”: [720.93, 51.96, 51.98, 52.00, 720.93, 98.01, 720.88, 52.03]
})
right = pd.DataFrame( {
“time”: [
pd.Timestamp(“2020-03-25 13:30:00.023”),
pd.Timestamp(“2020-03-25 13:30:00.038”),
pd.Timestamp(“2020-03-25 13:30:00.048”),
pd.Timestamp(“2020-03-25 13:30:00.048”),
pd.Timestamp(“2020-03-25 13:30:00.048”)
],
“ticker”: [“MSFT”, “MSFT”, “GOOG”, “GOOG”, “AAPL”],
“price”: [51.95, 51.95, 720.77, 720.92, 98.0],
“quantity”: [75, 155, 100, 100, 100]
})
第 3 步:合并数据框并打印它们
在这一步中,将使用函数“pd.merge_asof()”合并数据帧。 merge_asof()函数的结果存储在一个变量中,然后使用“print()”打印该变量。
Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(right, left, on="time",
by="ticker")
# print the variable
print(merged_dataframe)
Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time",
by="ticker")
# print the variable
print(merged_dataframe)
Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
tolerance=pd.Timedelta("2ms"))
# print the variable
print(merged_dataframe)
Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
tolerance=pd.Timedelta("2ms"),
allow_exact_matches=False)
# print the variable
print(merged_dataframe)
Python3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, left, on="time",
by="ticker")
# print the variable
print(merged_dataframe)
输出 :
示例 1:现在我们在 merge_asof函数更改左右 Dataframe 的位置。
蟒蛇3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time",
by="ticker")
# print the variable
print(merged_dataframe)
输出:
注意:因此,从我们的 2 个输出中可以清楚地看出,当我们将右 DataFrame 放在第一位时,输出中的行数为 5 等于右 DataFrame 中的行数,而当左 DataFrame 放在第一位时那么输出中的行数等于左侧 DataFrame 中的行数。如果我们查看两个输出并比较它们,那么我们可以很容易地说 merge_asof() 类似于左连接,除了我们匹配最近的键而不是相等的键。
示例2:我们只在报价时间和交易时间之间的2ms内进行。
蟒蛇3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
tolerance=pd.Timedelta("2ms"))
# print the variable
print(merged_dataframe)
输出 :
示例3:我们只在报价时间和交易时间之间的10ms内进行asof,并且我们排除了时间上的完全匹配。但是,先前的数据将向前传播。
蟒蛇3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, right, on="time", by="ticker",
tolerance=pd.Timedelta("2ms"),
allow_exact_matches=False)
# print the variable
print(merged_dataframe)
输出 :
示例 4:当两个地方都使用相同的 DataFrame 时。在这个左边的 Dataframe 中,两边都使用了。
蟒蛇3
# Importing the required package
import pandas as pd
# Creating the DataFrame of left side
left = pd.DataFrame({
"time": [pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.030"),
pd.Timestamp("2020-03-25 13:30:00.041"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.049"),
pd.Timestamp("2020-03-25 13:30:00.072"),
pd.Timestamp("2020-03-25 13:30:00.075")
],
"ticker": ["GOOG", "MSFT", "MSFT", "MSFT", "GOOG",
"AAPL", "GOOG", "MSFT"],
"bid": [720.50, 51.95, 51.97, 51.99, 720.50, 97.99,
720.50, 52.01],
"ask": [720.93, 51.96, 51.98, 52.00, 720.93, 98.01,
720.88, 52.03]
})
# Creating the Dataframe of right side
right = pd.DataFrame({
"time": [
pd.Timestamp("2020-03-25 13:30:00.023"),
pd.Timestamp("2020-03-25 13:30:00.038"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048"),
pd.Timestamp("2020-03-25 13:30:00.048")
],
"ticker": ["MSFT", "MSFT", "GOOG", "GOOG", "AAPL"],
"price": [51.95, 51.95, 720.77, 720.92, 98.0],
"quantity": [75, 155, 100, 100, 100]
})
# Applying merge_asof on data and store it
# in a variable
merged_dataframe = pd.merge_asof(left, left, on="time",
by="ticker")
# print the variable
print(merged_dataframe)
输出 :
它将相同的数据帧创建为 2 个帧,一个表示为 x,另一个表示为 y,即bid_x、bid_y、ask_x、ask_y。