如何根据 Pandas 中的日期过滤 DataFrame 行?
不同地区遵循不同的日期约定(YYYY-MM-DD、YYYY-DD-MM、DD/MM/YY 等)。很难在数据中处理这样的字符串。熊猫to_datetime()函数 允许将字符串格式的日期和时间转换为 datetime64。此数据类型有助于提取从“年”到“微秒”的日期和时间特征。
要根据日期过滤行,首先将 DataFrame 中的日期格式化为 datetime64 类型。然后使用DataFrame.loc[]和DataFrame.query[] Pandas 包中的函数来指定过滤条件。结果,获取数据的子集,即过滤后的DataFrame。让我们看一些相同的例子。
我们将使用一个示例 DataFrame,其中包含特定日期的帖子数。将示例数据中的日期转换为 datetime64 类型,如下所示。
Python
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date' : ['2020-08-09', '2020-08-25', '2020-09-05',
'2020-09-12', '2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02', '2020-12-10',
'2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Display dataframe
df
Python3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
& (df['date'] < '2020-09-15')]
# Display
filtered_df
Python3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
& (df['date'] < '2020-09-15')]
# Display
print("\nPosts in December:")
print(filtered_df)
# Filter data for specific weekday (tuesday)
filtered_df = df.loc[df['date'].dt.weekday == 2]
# Display
print("\nPosts on all Tuesdays:")
print(filtered_df)
Python3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.query("date >= '2020-08-01' \
and date < '2020-09-01'")
# Display
filtered_df
示例 1:
使用DataFrame.loc[]函数根据日期过滤数据, loc[]函数用于通过标签或布尔数组访问 DataFrame 的一组行和列。在此示例中,如果行满足条件(日期在 9 月 1 日和 15 日之间),则loc[]中的条件语句返回值为 True 的布尔数组,否则返回 False 值。然后loc[]函数只返回那些具有 True 值的行。
蟒蛇3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
& (df['date'] < '2020-09-15')]
# Display
filtered_df
输出:
示例 2:
使用DateTimeIndex(dt)访问单独的日期时间属性,例如年、月、日、工作日、小时、分钟、秒、微秒等,作为loc[]函数中的条件,如下所示。
注意:日期值应采用 datetime64 格式。
蟒蛇3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.loc[(df['date'] >= '2020-09-01')
& (df['date'] < '2020-09-15')]
# Display
print("\nPosts in December:")
print(filtered_df)
# Filter data for specific weekday (tuesday)
filtered_df = df.loc[df['date'].dt.weekday == 2]
# Display
print("\nPosts on all Tuesdays:")
print(filtered_df)
输出:
示例 3:
使用DataFrame.query()函数根据日期过滤数据, query()函数过滤 Pandas DataFrame 并通过在引号内指定条件来选择行。如下图, query()里面的条件是选择日期在8月份的数据(指定日期范围)。 DataFrame 的列默认放置在查询命名空间中,因此无需索引即可访问日期列(只需指定列名)。
蟒蛇3
# Import Pandas package
import pandas as pd
# Create a sample dataframe
df = pd.DataFrame({'num_posts': [4, 6, 3, 9, 1, 14, 2, 5, 7, 2],
'date': ['2020-08-09', '2020-08-25',
'2020-09-05', '2020-09-12',
'2020-09-29', '2020-10-15',
'2020-11-21', '2020-12-02',
'2020-12-10', '2020-12-18']})
# Convert the date to datetime64
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
# Filter data between two dates
filtered_df = df.query("date >= '2020-08-01' \
and date < '2020-09-01'")
# Display
filtered_df
输出: