比较 R data.table 中的相邻行
data.table包用于简化 R 编程语言中数据表的子集、分组和更新操作等数据操作操作。
索引方法用于创建一个新列,该列计算与同一组中遇到的先前值的滞后。该组使用“by”属性进行说明。添加新列并使用 c(NA, x[-.N]) 方法添加其对应的值,其中 x 是用于计算新列值的列的指示符。使用 NA 替换特定组中值的第一个实例。
句法:
dt[, new-col-name := c(NA, x[-.N]), by ]
示例 1:比较 R Data.table 中的相邻行
R
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
col2 = sample(1:6 , 12, replace = TRUE)
)
print ("original data frame")
print (data_frame)
# computing lag group by column1
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
R
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
col2 = sample(1:6 , 12, replace = TRUE)
)
print ("original data frame")
print (data_frame)
# computing lag group by column1
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
data_mod <-data_frame[, difference := col2 - lag]
print ("modified data frame")
print (data_mod)
R
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],16, replace = TRUE),
col2 = 100:115
)
print ("original data frame")
print (data_frame)
# computing difference
data_frame[, col3 := c(NA, col2[-.N]), by = col1]
data_mod <-data_frame[, difference := col2 - col3]
print ("modified data frame")
print (data_mod)
输出
[1] "original data frame"
col1 col2
1: b 6
2: c 5
3: a 1
4: d 6
5: d 5
6: b 6
7: b 5
8: a 2
9: c 6
10: a 3
11: a 4
12: d 1
[1] "modified data frame"
col1 col2 lag
1: b 6 NA
2: c 5 NA
3: a 1 NA
4: d 6 NA
5: d 5 6
6: b 6 6
7: b 5 6
8: a 2 1
9: c 6 5
10: a 3 2
11: a 4 3
12: d 1 5
现在,使用新列和现有列 x 的值在数据表中使用的公式计算相邻行之间的差异。
句法:
data_frame[, diff-col := x – new-col-name]
示例 2: R 中相邻 data.table 之间的差异
电阻
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],12, replace = TRUE),
col2 = sample(1:6 , 12, replace = TRUE)
)
print ("original data frame")
print (data_frame)
# computing lag group by column1
data_frame[, lag := c(NA, col2[-.N]), by = col1]
print ("modified data frame")
print (data_frame)
data_mod <-data_frame[, difference := col2 - lag]
print ("modified data frame")
print (data_mod)
输出
[1] "original data frame"
col1 col2
1: a 1
2: d 3
3: d 6
4: d 3
5: d 2
6: b 4
7: d 5
8: c 6
9: d 2
10: b 4
11: d 1
12: a 6
[1] "modified data frame"
col1 col2 lag difference
1: a 1 NA NA
2: d 3 NA NA
3: d 6 3 3
4: d 3 6 -3
5: d 2 3 -1
6: b 4 NA NA
7: d 5 2 3
8: c 6 NA NA
9: d 2 5 -3
10: b 4 4 0
11: d 1 2 -1
12: a 6 1 5
示例 3:
电阻
# importing required packages
library("data.table")
# declaring data frame
data_frame <- data.table(col1 = sample(letters[1:4],16, replace = TRUE),
col2 = 100:115
)
print ("original data frame")
print (data_frame)
# computing difference
data_frame[, col3 := c(NA, col2[-.N]), by = col1]
data_mod <-data_frame[, difference := col2 - col3]
print ("modified data frame")
print (data_mod)
输出
[1] "original data frame"
col1 col2
1: d 100
2: a 101
3: b 102
4: a 103
5: d 104
6: d 105
7: c 106
8: a 107
9: b 108
10: a 109
11: b 110
12: d 111
13: b 112
14: d 113
15: c 114
16: b 115
[1] "modified data frame"
col1 col2 col3 difference
1: d 100 NA NA
2: a 101 NA NA
3: b 102 NA NA
4: a 103 101 2
5: d 104 100 4
6: d 105 104 1
7: c 106 NA NA
8: a 107 103 4
9: b 108 102 6
10: a 109 107 2
11: b 110 108 2
12: d 111 105 6
13: b 112 110 2
14: d 113 111 2
15: c 114 106 8
16: b 115 112 3