在 R 中使用 Dplyr 按一个或多个变量分组
group_by()方法用于根据特定列中包含的组划分和隔离日期。所需的分组依据列被指定为该函数的参数。它可能包含多个列名。
句法:
group_by(col1, col2, …)
示例 1:按一个变量分组
R
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# comouting difference of each group
data_frame%>%group_by(col1)
R
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# comouting difference of each group
data_frame%>%group_by(col1,col2)
输出
[1] "Original DataFrame"
col1 col2 col3
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1 [2]
col1 col2 col3
1 6 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 7 b NA
6 6 c NA
7 6 a 2
8 6 b NA
9 7 c 2
分组也可以使用属于数据框的多个列来完成,只需将列的名称传递给函数。
示例 2:按多列分组
电阻
# installing required libraries
library("dplyr")
# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
col2 = letters[1:3],
col3 = c(1,4,5,1,NA,NA,2,NA,2))
print ("Original DataFrame")
print (data_frame)
print ("Modified DataFrame")
# comouting difference of each group
data_frame%>%group_by(col1,col2)
输出
[1] "Original DataFrame"
col1 col2 col3
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2
[1] "Modified DataFrame"
# A tibble: 9 x 3
# Groups: col1, col2 [6]
col1 col2 col3
1 7 a 1
2 7 b 4
3 7 c 5
4 6 a 1
5 6 b NA
6 6 c NA
7 7 a 2
8 6 b NA
9 6 c 2