根据 R 中具有特定顺序的向量对 DataFrame 行进行排序
在本文中,我们将看到如何根据具有特定顺序的向量的值对数据框行进行排序。有两个函数可以根据向量的值对数据框行进行排序。
- 匹配()函数
- left_join()函数
示例数据集:
data <- data.frame(x1 = 1:5,
x2 = letters[1:5],
x3 = 6:10)
data
x1 x2 x3
1 1 a 6
2 2 b 7
3 3 c 8
4 4 d 9
5 5 e 10
具有特定顺序的向量:
vec <- c("b", "e", "a", "c", "d")
vec
# "b" "e" "a" "c" "d"
方法一:使用match()函数根据向量对数据帧进行排序。
Match 在其第二个参数中返回其第一个参数的(第一个)匹配项的位置向量。
Syntax: match(x, table, nomatch = NA_integer_, incomparables = NULL)
Parameters:
- X: Vector or NULL: the values to be matched. Long vectors are supported.
- table: vector or NULL: the values to be matched against. Long vectors are not supported.
- nomatch: the value to be returned in the case when no match is found. Note that it is coerced to integer.
- incomparables: A vector of values that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value. For historical reasons, FALSE is equivalent to NULL.
代码:
R
data <- data.frame(x1 = 1:5,
x2 = letters[1:5],
x3 = 6:10)
vec <- c("b", "e", "a", "c", "d")
new_dataset <- data[match(vec, data$x2), ]
new_dataset
R
install.packages("dplyr")
library("dplyr")
data <- data.frame(x1 = 1:5,
x2 = letters[ 1 : 5] ,
x3 = 6:10)
vec <- c("b", "e", "a", "c", "d")
new_dataset <- left_join(data.frame(x2 = vec),
data,
by = "x2")
print(new_dataset)
输出:
x1 x2 x3
2 2 b 7
5 5 e 10
1 1 a 6
3 3 c 8
4 4 d 9
正如我们从上面的输出中看到的,新的数据框是根据向量的值排序的。
方法二:使用dplyr包的left_join()函数:
首先,我们必须安装并加载 dplyr 包:现在我们可以使用 left_join() 方法根据向量上的值对数据框进行排序。
Syntax: left_join(x, y, by = NULL, copy = FALSE, suffix = c(“.x”, “.y”), …)
Parameters:
- x, y: tbls to join
- by: a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they’re right (to suppress the message, simply explicitly list the variables that you want to join).
- copy: If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.
- suffix: If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
代码:
电阻
install.packages("dplyr")
library("dplyr")
data <- data.frame(x1 = 1:5,
x2 = letters[ 1 : 5] ,
x3 = 6:10)
vec <- c("b", "e", "a", "c", "d")
new_dataset <- left_join(data.frame(x2 = vec),
data,
by = "x2")
print(new_dataset)
输出:
x2 x1 x3
1 b 2 7
2 e 5 10
3 a 1 6
4 c 3 8
5 d 4 9