📅  最后修改于: 2023-12-03 15:04:02.029000             🧑  作者: Mango
In PySpark, Datetime operations are supported by the pyspark.sql.functions
library. To add hours to a datetime column, we need to first convert the column to a timestamp datatype, perform the operation, and then convert the column back to its original datatype.
We can convert a column to a timestamp datatype using the to_timestamp()
function. This function requires a column and a format string as arguments.
from pyspark.sql.functions import to_timestamp
df = df.withColumn("timestamp_col", to_timestamp("date_col", "yyyy-MM-dd HH:mm:ss"))
The above code converts the date_col
column to a timestamp datatype and assigns it to a new column called timestamp_col
. The format of the input string is yyyy-MM-dd HH:mm:ss
.
After converting the column to a timestamp datatype, we can add hours using the date_add()
function. This function requires two arguments - the column to be modified, and a number of hours to add.
from pyspark.sql.functions import date_add
df = df.withColumn("timestamp_col", date_add("timestamp_col", 3))
The above code adds 3 hours to the timestamp_col
column.
After performing the operation, we can convert the timestamp_col
column back to its original datatype using the date_format()
function. This function requires two arguments - the column to be modified, and a format string.
from pyspark.sql.functions import date_format
df = df.withColumn("date_col", date_format("timestamp_col", "yyyy-MM-dd"))
The above code converts the timestamp_col
column back to a date_col
column with the format yyyy-MM-dd
.
In this article, we discussed how to add hours to a datetime column in PySpark using the pyspark.sql.functions
library. We first converted the column to a timestamp datatype, added hours, and then converted it back to its original datatype.