📌  相关文章
📜  在 R 中将 DataFrame 拆分为自定义 Bins

📅  最后修改于: 2022-05-13 01:55:44.162000             🧑  作者: Mango

在 R 中将 DataFrame 拆分为自定义 Bins

在本文中,我们将了解如何在 R 编程语言中将数据帧拆分为自定义 bin。

基 R 中的 cut() 方法用于首先划分数据帧的范围,然后根据它们所属的区间划分值。每个间隔对应于数据帧的一个级别。因此,层数相当于 cut 方法中的breaks 参数的长度。

示例 1:将数据帧拆分为自定义箱



R
# creating a dataframe 
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
  
# getting rows of data
rows <- nrow(data_frame)
  
# custom bins
bins <- cut(1:rows,             
            breaks = c(0,6,rows         
                       ))
level_bins <- levels(bins)
  
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {    
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
  
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
  
print("DataFrame Subset 2")
print(data_frame_2)


R
# creating a dataframe 
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
  
# getting rows of data
rows <- nrow(data_frame)
  
# custom bins
bins <- cut(1:rows,             
            breaks = c(0,2,4,rows       
                       ))
level_bins <- levels(bins)
  
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {    
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
  
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
  
print("DataFrame Subset 2")
print(data_frame_2)
  
print("DataFrame Subset 3")
print(data_frame_3)


R
# creating a dataframe 
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
  
print("Original DataFrame")
print(data_frame)
  
# getting rows of data
rows <- nrow(data_frame)
  
# custom bins
bins <- cut(1:rows,5)
level_bins <- levels(bins)
  
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {    
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
  
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
  
print("DataFrame Subset 2")
print(data_frame_2)
  
print("DataFrame Subset 3")
print(data_frame_3)
  
print("DataFrame Subset 4")
print(data_frame_4)
  
print("DataFrame Subset 5")
print(data_frame_5)


输出:

示例 2: I 说明了指定三个断点的用法,从而将行划分为原始数据帧的三个子集。

电阻

# creating a dataframe 
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
  
# getting rows of data
rows <- nrow(data_frame)
  
# custom bins
bins <- cut(1:rows,             
            breaks = c(0,2,4,rows       
                       ))
level_bins <- levels(bins)
  
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {    
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
  
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
  
print("DataFrame Subset 2")
print(data_frame_2)
  
print("DataFrame Subset 3")
print(data_frame_3)

输出:

示例 3: cut 方法还可以指定要划分数据帧的相等部分的数量。这被指定为方法的第二个参数。数据帧被分成那些数量的等效部分,并相应地分配指定的名称。以下代码将数据帧划分为 5 个大小相同的自定义 bin:

电阻

# creating a dataframe 
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
  
print("Original DataFrame")
print(data_frame)
  
# getting rows of data
rows <- nrow(data_frame)
  
# custom bins
bins <- cut(1:rows,5)
level_bins <- levels(bins)
  
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {    
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
  
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
  
print("DataFrame Subset 2")
print(data_frame_2)
  
print("DataFrame Subset 3")
print(data_frame_3)
  
print("DataFrame Subset 4")
print(data_frame_4)
  
print("DataFrame Subset 5")
print(data_frame_5)

输出: