在 R 中将 DataFrame 拆分为自定义 Bins
在本文中,我们将了解如何在 R 编程语言中将数据帧拆分为自定义 bin。
基 R 中的 cut() 方法用于首先划分数据帧的范围,然后根据它们所属的区间划分值。每个间隔对应于数据帧的一个级别。因此,层数相当于 cut 方法中的breaks 参数的长度。
Syntax: cut(x, breaks, labels = NULL)
Arguments :
- x – Numeric vector to be divided
- Breaks – A vector containing the intervals
- Labels – labelling of the groups
示例 1:将数据帧拆分为自定义箱
R
# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
col2 = letters[1:10],
col3 = c(rep(TRUE,4),
rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
# getting rows of data
rows <- nrow(data_frame)
# custom bins
bins <- cut(1:rows,
breaks = c(0,6,rows
))
level_bins <- levels(bins)
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {
assign(paste0("data_frame_", i),
data_frame[bins == levels(bins)[i], ])
}
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
print("DataFrame Subset 2")
print(data_frame_2)
R
# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
col2 = letters[1:10],
col3 = c(rep(TRUE,4),
rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
# getting rows of data
rows <- nrow(data_frame)
# custom bins
bins <- cut(1:rows,
breaks = c(0,2,4,rows
))
level_bins <- levels(bins)
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {
assign(paste0("data_frame_", i),
data_frame[bins == levels(bins)[i], ])
}
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
print("DataFrame Subset 2")
print(data_frame_2)
print("DataFrame Subset 3")
print(data_frame_3)
R
# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
col2 = letters[1:10],
col3 = c(rep(TRUE,4),
rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
# getting rows of data
rows <- nrow(data_frame)
# custom bins
bins <- cut(1:rows,5)
level_bins <- levels(bins)
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {
assign(paste0("data_frame_", i),
data_frame[bins == levels(bins)[i], ])
}
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
print("DataFrame Subset 2")
print(data_frame_2)
print("DataFrame Subset 3")
print(data_frame_3)
print("DataFrame Subset 4")
print(data_frame_4)
print("DataFrame Subset 5")
print(data_frame_5)
输出:
示例 2: I 说明了指定三个断点的用法,从而将行划分为原始数据帧的三个子集。
电阻
# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
col2 = letters[1:10],
col3 = c(rep(TRUE,4),
rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
# getting rows of data
rows <- nrow(data_frame)
# custom bins
bins <- cut(1:rows,
breaks = c(0,2,4,rows
))
level_bins <- levels(bins)
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {
assign(paste0("data_frame_", i),
data_frame[bins == levels(bins)[i], ])
}
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
print("DataFrame Subset 2")
print(data_frame_2)
print("DataFrame Subset 3")
print(data_frame_3)
输出:
示例 3: cut 方法还可以指定要划分数据帧的相等部分的数量。这被指定为方法的第二个参数。数据帧被分成那些数量的等效部分,并相应地分配指定的名称。以下代码将数据帧划分为 5 个大小相同的自定义 bin:
电阻
# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
col2 = letters[1:10],
col3 = c(rep(TRUE,4),
rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
# getting rows of data
rows <- nrow(data_frame)
# custom bins
bins <- cut(1:rows,5)
level_bins <- levels(bins)
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {
assign(paste0("data_frame_", i),
data_frame[bins == levels(bins)[i], ])
}
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
print("DataFrame Subset 2")
print(data_frame_2)
print("DataFrame Subset 3")
print(data_frame_3)
print("DataFrame Subset 4")
print(data_frame_4)
print("DataFrame Subset 5")
print(data_frame_5)
输出: