如何从 Pandas 数据帧的时间戳列中删除时区
世界分为 24 个时区。我们都知道需要不同的时区,因为整个地球不是同时点亮的。虽然在许多情况下,我们可能不需要时区,尤其是在数据驻留在某个位置甚至我们本地系统的公共服务器上的情况下。在本文中,我们将了解如何从 Pandas 数据帧的时间戳列中删除时区。
创建用于演示的数据框:
Python
import pandas as pd
from datetime import datetime, timezone
# CREATE THE PANDAS DATAFRAME
# WITH TIMESTAMP COLUMN
df = pd.DataFrame({
"orderNo": [
"4278954",
"3473895",
"8763762",
"4738289",
"1294394"
],
"timestamp": [
datetime.strptime("2021-06-01",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-02",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-03",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-04",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-05",
"%Y-%m-%d").replace(tzinfo=timezone.utc)
]
})
# PRINT THE DATATYPES OF
# EACH COLUMN OF DATAFRAME
print(df.dtypes)
# VIEW THE DATAFRAME
print(df)
Python
import pandas as pd
from datetime import datetime, timezone
# CREATE THE DATAFRAME
df = pd.DataFrame({
"orderNo": [
"4278954",
"3473895",
"8763762",
"4738289",
"1294394"
],
"timestamp": [
datetime.strptime("2021-06-01",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-02",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-03",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-04",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-05",
"%Y-%m-%d").replace(tzinfo=timezone.utc)
]
})
# PRINT THE DATATYPE OF
# EACH COLUMN BEFORE MANIPULATION
print(df.dtypes)
# FUNCTION TO REMOVE TIMEZONE
def remove_timezone(dt):
# HERE `dt` is a python datetime
# object that used .replace() method
return dt.replace(tzinfo=None)
# APPLY THE ABOVE FUNCTION TO
# REMOVE THE TIMEZONE INFORMATION
# FROM EACH RECORD OF TIMESTAMP COLUMN IN DATAFRAME
df['timestamp'] = df['timestamp'].apply(remove_timezone)
# PRINT THE DATATYPE OF
# EACH COLUMN AFTER MANIPULATION
print(df.dtypes)
Python
import pandas as pd
from datetime import datetime, timezone
# CREATE THE DATAFRAME
df = pd.DataFrame({
"orderNo": [
"4278954",
"3473895",
"8763762",
"4738289",
"1294394"
],
"timestamp": [
datetime.strptime(
"2021-06-01", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-02", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-03", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-04", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-05", "%Y-%m-%d").replace(tzinfo=timezone.utc)
]
})
# PRINT THE DATATYPE OF EACH COLUMN BEFORE
# MANIPULATION
print(df.dtypes)
# REMOVING THE TIMEZONE INFORMATION
df['timestamp'] = df['timestamp'].dt.tz_localize(None)
# PRINT THE DATATYPE OF EACH COLUMN AFTER
# MANIPULATION
print(df.dtypes)
输出:
输出的第一部分告诉我们时间戳列是一个 DateTime 对象。方括号中的UTC表示包含时区信息,它实际上是 UTC 时间戳。这是因为我们提供的时区为 UTC。
方法一:使用 datetime.replace() 方法
日期时间.replace() 函数用于用给定的参数替换 DateTime 对象的内容。
Syntax: Datetime_object.replace(tzinfo)
Parameters:
- tzinfo: New time zone info.
Returns: It returns the modified datetime object
现在,我们将创建一个函数来使用datetime模块删除时区。该函数将应用于时间戳列中的每条记录。
Python
import pandas as pd
from datetime import datetime, timezone
# CREATE THE DATAFRAME
df = pd.DataFrame({
"orderNo": [
"4278954",
"3473895",
"8763762",
"4738289",
"1294394"
],
"timestamp": [
datetime.strptime("2021-06-01",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-02",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-03",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-04",
"%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime("2021-06-05",
"%Y-%m-%d").replace(tzinfo=timezone.utc)
]
})
# PRINT THE DATATYPE OF
# EACH COLUMN BEFORE MANIPULATION
print(df.dtypes)
# FUNCTION TO REMOVE TIMEZONE
def remove_timezone(dt):
# HERE `dt` is a python datetime
# object that used .replace() method
return dt.replace(tzinfo=None)
# APPLY THE ABOVE FUNCTION TO
# REMOVE THE TIMEZONE INFORMATION
# FROM EACH RECORD OF TIMESTAMP COLUMN IN DATAFRAME
df['timestamp'] = df['timestamp'].apply(remove_timezone)
# PRINT THE DATATYPE OF
# EACH COLUMN AFTER MANIPULATION
print(df.dtypes)
输出:
在输出中,我们可以看到在操作时区之前,DateTime 列即“ timestamp”列具有UTC 时区信息。在数据帧时间戳列的每条记录上应用remove_timezone函数后,我们在数据帧中看不到任何 UTC 信息。数据框中的“ timestamp ”列将Python datetime对象作为其值。因此,当这些值中的每一个通过 in remove_timezone()函数,它都会使用Python datetime 模块的replace()方法。
方法 2:使用 Pandas
我们可以在不使用 DateTime 模块的情况下实现相同的目标。让我们看看如何——
Python
import pandas as pd
from datetime import datetime, timezone
# CREATE THE DATAFRAME
df = pd.DataFrame({
"orderNo": [
"4278954",
"3473895",
"8763762",
"4738289",
"1294394"
],
"timestamp": [
datetime.strptime(
"2021-06-01", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-02", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-03", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-04", "%Y-%m-%d").replace(tzinfo=timezone.utc),
datetime.strptime(
"2021-06-05", "%Y-%m-%d").replace(tzinfo=timezone.utc)
]
})
# PRINT THE DATATYPE OF EACH COLUMN BEFORE
# MANIPULATION
print(df.dtypes)
# REMOVING THE TIMEZONE INFORMATION
df['timestamp'] = df['timestamp'].dt.tz_localize(None)
# PRINT THE DATATYPE OF EACH COLUMN AFTER
# MANIPULATION
print(df.dtypes)
输出:
在上面的示例中,我们可以看到可以将dt.tz_localize(None)方法应用于数据帧列以删除时区信息。与上述示例类似的输出反映了操作后,时间戳列中不再存在 UTC 时区信息。