如何使用嵌套条件提取 R DataFrame 中的随机行样本
在本文中,我们将学习如何在具有嵌套条件的 R 编程语言中提取 DataFrame 中的随机行样本。
方法一:使用sample()
我们将使用 sample()函数来执行此任务。 R 语言中的sample()函数根据函数调用中提供的参数创建随机样本。它接受一个向量或一个正整数作为函数参数中的对象。
我们将使用的另一个函数是which()。此函数将帮助我们提供提取样本的条件。 which()函数返回满足参数中给定条件的元素(以及元素的索引)。
Syntax: df[ sample(which ( conditions ) ,n), ]
Parameters:
- df: DataFrame
- n: number of samples to be generated
- conditions: samples are extracted according to this condition. Ex: df$year > 5
使用中的数据帧:
name | year | length | education | |
---|---|---|---|---|
1 | Welcome | 10 | 40 | yes |
2 | to | 51 | NA | yes |
3 | Geeks | 19 | NA | no |
4 | for | 126 | 100 | no |
5 | Geeks | 99 | 95 | yes |
因此,要实现这种方法,首先创建数据帧,然后将其与将用于从数据帧中提取行的条件一起传递给 sample()。下面给出了使用上述数据框来说明相同的实现。
示例 1:
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
df[ sample(which (df$year > 5) ,2), ]
R
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$name != "to") %>% sample_n(., 2)
R
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$year >20 ) %>% sample_n(., 2)
输出:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
示例 2:
电阻
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 3 rows
print("3 samples")
df[ sample(which (df$education !="no") ,3), ]
输出:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "3 samples"
name year length education
5 Geeks 99 95 yes
1 Welcome 10 40 yes
2 to 51 NA yes
方法二:使用sample_n()函数
R 语言中的 sample_n()函数用于从数据框中获取随机样本样本。
Syntax: sample_n(x, n)
Parameters:
- x: Data Frame
- n: size/number of items to select
除了sample_n ()函数,我们还使用了filter() 函数。 R 语言中的 filter()函数用于选择案例并根据过滤表达式过滤掉值。
Syntax: filter(x, expr)
Parameters:
- x: Object to be filtered
- expr: expression as a base for filtering
我们已经加载了dplyr包,因为它包含filter()和sample_n()函数。在过滤器函数的参数中,我们将示例dataframe->df和嵌套条件作为参数传递。然后我们使用我们的 sample_n()函数在满足条件后从数据帧中提取“ n ”个样本。
Syntax: filter(df, condition) %>% sample_n(., n)
Parameters:
- df: Dataframe Object
- condition: Nested conditionals. Ex: df$name != “to”
- n: Number of samples
示例 1:
电阻
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$name != "to") %>% sample_n(., 2)
输出:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 Welcome 10 40 yes
2 Geeks 99 95 yes
示例 2:
电阻
library(dplyr)
df <- data.frame( name = c("Welcome", "to", "Geeks",
"for", "Geeks"),
year = c(10, 51, 19, 126, 99),
length = c(40, NA, NA, 100, 95),
education = c("yes", "yes", "no",
"no", "yes") )
df
# Printing 2 rows
print("2 samples")
filter(df, df$year >20 ) %>% sample_n(., 2)
输出:
name year length education
1 Welcome 10 40 yes
2 to 51 NA yes
3 Geeks 19 NA no
4 for 126 100 no
5 Geeks 99 95 yes
[1] "2 samples"
name year length education
1 for 126 100 no
2 to 51 NA yes