如何在 R 中聚合多个列?
在本文中,我们将讨论如何在 R 编程语言中聚合多个列。
聚合意味着组合两个或多个数据。在这里,我们将使用聚合函数来获取数据框中一个或多个变量的汇总统计信息。
语法:
aggregate(sum_column ~ group_column, data, FUN)
在哪里,
- 数据是输入数据框
- sum_column 是可以汇总的列
- group_column 是要分组的列。
- FUN 是指 sum、mean、min、max 等函数。
示例:
让我们创建一个数据框
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# display
data
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks by grouping with subjects
aggregate(marks~ subjects, data, FUN=sum)
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks by grouping with subjects and names
aggregate(marks~ subjects+names, data, FUN=sum)
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks and id by grouping with subjects
aggregate(cbind(marks, id)~ subjects, data, FUN=sum)
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks and id by grouping
# with subjects and names
aggregate(cbind(marks, id)~ subjects+names, data, FUN=sum)
输出:
示例 1:汇总一个变量并按一个变量分组
在这里,我们将通过将一个变量与一个变量分组来获得一个变量的摘要。
语法:
aggregate(sum_column ~ group_column, data, FUN=sum)
在这个例子中,我们将使用 sum函数通过与主题分组来获得一些分数。
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks by grouping with subjects
aggregate(marks~ subjects, data, FUN=sum)
输出:
示例 2:汇总一个变量并按多个变量分组
在这里,我们将通过将一个变量与一个或多个变量分组来获得一个变量的摘要。我们必须使用 +运算符对多列进行分组。
语法:
aggregate(sum_column ~ group_column1+group_column2+……………group_columnn, data, FUN=sum)
在此示例中,我们将对名称和主题进行分组以获得分数总和。
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks by grouping with subjects and names
aggregate(marks~ subjects+names, data, FUN=sum)
输出:
示例 3:汇总多个变量并按一个变量分组
在这里,我们将通过对一个变量进行分组来获得一个或多个变量的摘要。我们将使用称为列绑定的 cbind()函数来获取多个变量的摘要。
语法:
aggregate(cbind(sum_column1,sum_column2,.,sum_column n) ~ group_column1+group_column2+……………group_columnn, data, FUN=sum)
在此示例中,我们将通过与主题分组来获得分数和 id 的总和。
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks and id by grouping with subjects
aggregate(cbind(marks, id)~ subjects, data, FUN=sum)
输出:
示例 4:汇总多个变量并按多个变量分组
在这里,我们将通过将一个或多个变量与一个或多个变量分组来获得一个或多个变量的摘要。我们可以使用 cbind() 组合一个或多个变量,使用 '+'运算符对多个变量进行分组。
语法:
aggregate(cbind(sum_column1,.,sum_column n)~ group_column1+….+group_column n, data, FUN=sum)
在此示例中,我们将通过将它们与主题和名称分组来获得标记和 id 的总和。
R
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))
# get sum of marks and id by grouping
# with subjects and names
aggregate(cbind(marks, id)~ subjects+names, data, FUN=sum)
输出: