如何在R中按组计算与数据帧前一行的时间差
数据帧可能由属于组的不同值组成。这些列可能具有属于不同数据类型或时间范围的值作为 POSIXct 对象。这些对象允许轻松应用数学运算,可以通过以下方式执行:
方法一:使用dplyr包
group_by 方法用于根据特定列中包含的组划分和隔离日期。所需的分组依据列被指定为该函数的参数。它可能包含多个列名。
句法:
group_by(col1, col2, …)
接下来是 mutate() 方法的应用,该方法用于移动方向并在数据中执行操作。可以使用新列名指定新列名。可以使用该库的 lag() 方法计算与前一行的差异。此方法在向量中查找先前的值。
Syntax:
lag(x, n = 1L, default = NA)
Parameter :
- x – A vector of values
- n – Number of positions to lag by
- default (Default : NA)- the value used for non-existent rows.
任何组的第一次出现都由 NA 值替换。
例子:
R
library(dplyr)
# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-07-18 00:21:07"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2021-05-11 18:32:07"),
as.POSIXct("2021-05-08 08:32:07"))
)
print ("Original DataFrame")
print (data_frame)
# comouting difference of each group
data_frame %>%
arrange(col1, col3) %>%
group_by(col1) %>%
mutate(diff = col3 - lag(col3))
R
# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-07-18 00:21:07"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2021-05-11 18:32:07"),
as.POSIXct("2021-05-08 08:32:07"))
)
print ("Original DataFrame")
print (data_frame)
# comouting difference of each group
data_frame$diff <- unlist(tapply(data_frame$col3, INDEX = data_frame$col1,
FUN = function(x) c(0, `units<-`(diff(x), "secs"))))
print ("Modified DataFrame")
print (data_frame)
R
library("data.table")
# creating a dataframe
dt <- data.table(col1 = sample(6:9, 5 , replace = TRUE),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-07-18 00:21:07"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2021-05-11 18:32:07"),
as.POSIXct("2021-05-08 08:32:07"))
)
print ("Original DataFrame")
print (dt)
# comouting difference of each group
dt[, diff := difftime(col3, shift(col3, fill=col3[1L]),
units="secs"), by=col1]
print ("Modified DataFrame")
print (dt)
输出
[1] "Original DataFrame"
col1 col3
1 8 2021-05-08 08:32:07
2 8 2021-07-18 00:21:07
3 7 2020-11-28 23:32:09
4 6 2021-05-11 18:32:07
5 7 2021-05-08 08:32:07
# A tibble: 5 x 3
# Groups: col1 [3]
col1 col3 diff
1 6 2021-05-11 18:32:07 NA secs
2 7 2020-11-28 23:32:09 NA secs
3 7 2021-05-08 08:32:07 13856398 secs
4 8 2021-05-08 08:32:07 NA secs
5 8 2021-07-18 00:21:07 6104940 secs
方法二:使用tapply方法
tapply() 方法用于在列表或数据框对象上应用函数。指定的函数(可以是用户定义的或预定义的)应用于数据帧对象的每个单元格。
Syntax:
tapply(X, INDEX, FUN )
Parameter :
- X – an R object, a dataframe. Typically vector-like, allowing sub-setting with [.
- INDEX – a list of one or more factors, each of same length as X. The elements are coerced to factors by as.factor.
- FUN – a function to be applied
在这种情况下,函数是计算时间范围内的差异,单位为秒。组中遇到的值的所有第一个实例都被零替换。
例子:
电阻
# creating a dataframe
data_frame <- data.frame(col1 = sample(6:9, 5 , replace = TRUE),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-07-18 00:21:07"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2021-05-11 18:32:07"),
as.POSIXct("2021-05-08 08:32:07"))
)
print ("Original DataFrame")
print (data_frame)
# comouting difference of each group
data_frame$diff <- unlist(tapply(data_frame$col3, INDEX = data_frame$col1,
FUN = function(x) c(0, `units<-`(diff(x), "secs"))))
print ("Modified DataFrame")
print (data_frame)
输出
[1] "Original DataFrame"
col1 col3
1 7 2021-05-08 08:32:07
2 6 2021-07-18 00:21:07
3 8 2020-11-28 23:32:09
4 7 2021-05-11 18:32:07
5 6 2021-05-08 08:32:07
[1] "Modified DataFrame"
col1 col3 diff
1 7 2021-05-08 08:32:07 0
2 6 2021-07-18 00:21:07 -6104940
3 8 2020-11-28 23:32:09 0
4 7 2021-05-11 18:32:07 295200
5 6 2021-05-08 08:32:07 0
方法 3:使用 data.table
可以添加一个新列来计算 data.table 的行之间的时间差。 difftime() 方法可用于计算这种差异。它用于计算时间间隔或差异。
Syntax:
difftime (t1 , t2 , units)
Parameter :
- t1, t2 – date-time or date objects.
- units – units in the form of character string to return the result
为了找到下一个时区值,即要在 difftime() 中应用的 t2,使用 shift() 方法在指定的输入向量或列表中引入超前或滞后。
Syntax:
shift (x , fill )
Parameter :
- x – A vector, list, data.frame or data.table.
- fill – indicator of the padding value to introduce
by 属性按指定的列名添加到数据组中。
例子:
电阻
library("data.table")
# creating a dataframe
dt <- data.table(col1 = sample(6:9, 5 , replace = TRUE),
col3 = c(as.POSIXct("2021-05-08 08:32:07"),
as.POSIXct("2021-07-18 00:21:07"),
as.POSIXct("2020-11-28 23:32:09"),
as.POSIXct("2021-05-11 18:32:07"),
as.POSIXct("2021-05-08 08:32:07"))
)
print ("Original DataFrame")
print (dt)
# comouting difference of each group
dt[, diff := difftime(col3, shift(col3, fill=col3[1L]),
units="secs"), by=col1]
print ("Modified DataFrame")
print (dt)
输出
[1] "Original DataFrame"
col1 col3
1: 7 2021-05-08 08:32:07
2: 7 2021-07-18 00:21:07
3: 8 2020-11-28 23:32:09
4: 8 2021-05-11 18:32:07
5: 8 2021-05-08 08:32:07
[1] "Modified DataFrame"
col1 col3 diff
1: 7 2021-05-08 08:32:07 0 secs
2: 7 2021-07-18 00:21:07 6104940 secs
3: 8 2020-11-28 23:32:09 0 secs
4: 8 2021-05-11 18:32:07 14151598 secs
5: 8 2021-05-08 08:32:07 -295200 secs