📌  相关文章
📜  在 R 中将 DataFrame 列转换为数字

📅  最后修改于: 2022-05-13 01:54:36.605000             🧑  作者: Mango

在 R 中将 DataFrame 列转换为数字

在本文中,我们将看到如何在 R 编程语言中将 DataFrame Column 转换为 Numeric。

所有数据框列都与一个类相关联,该类是该列的元素所属的数据类型的指示符。因此,为了模拟数据类型转换,在这种情况下必须将数据元素转换为所需的数据类型,即该列的所有元素都应该有资格成为数值。

sapply() 方法可用于以向量的形式检索列变量的数据类型。用于以下操作的数据框如下:

R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of 
# each variable 
sapply(data_frame, class)


R
# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame,col2 = as.numeric(col2))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each variable 
sapply(data_frame_mod, class)


R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame, col2 = as.numeric(as.character(col2)))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each
# variable 
sapply(data_frame_mod, class)


R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column
# to numeric 
data_frame_col1 <- transform(
  data_frame,col1 = as.numeric(col1))
  
print("Modified col1 DataFrame")
print (data_frame_col1)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col1, class)
  
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(col3))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)


R
# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(as.factor(col3)))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)


R
# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100,
                col5 = c(TRUE,FALSE,TRUE,FALSE))
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col5 <- transform(
  data_frame,col5 = as.numeric(as.factor(col5)))
print("Modified col5 DataFrame")
print (data_frame_col5)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col5, class)


输出:



[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"

transform() 方法可用于模拟在此方法的参数列表中指定的数据对象中的修改。必须将更改显式保存到相同的数据帧或新的数据帧中。它可用于向数据添加新变量或修改现有变量。

示例 1:将因子类型列转换为数字

进行这些转换时可能不会保留数据。数据可能会丢失或被篡改。转换操作的结果必须保存在某个变量中,以便进一步使用它。以下代码片段说明了这一点:

电阻

# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame,col2 = as.numeric(col2))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each variable 
sapply(data_frame_mod, class)

输出:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    1    b   97
2    2    2    c   98
3    3    3    d   99
4    4    4    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer" 

说明: col2 中原始数据帧的值范围为 4 到 7,而修改后它们是从 1 开始的整数。这意味着在直接将因子转换为数字时,数据可能不会被保留。



为了保留数据,需要首先将列的类型显式转换为 as。字符(列名)。

电阻

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(1:4), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting factor type column to 
# numeric 
data_frame_mod <- transform(
  data_frame, col2 = as.numeric(as.character(col2)))
  
print("Modified DataFrame")
print (data_frame_mod)
  
# indicating the data type of each
# variable 
sapply(data_frame_mod, class)

输出:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified DataFrame"
 col1 col2 col3 col4
1    1    4    b   97
2    2    5    c   98
3    3    6    d   99
4    4    7    e  100
      col1        col2        col3        col4
"character"   "numeric" "character"   "integer" 

说明:为了保持数据的一致性,首先将col2的数据类型改为as。字符,然后到数值,它按原样显示数据。

示例 2:将字符类型列转换为数字

字符类型的列,是单个字符或字符串,只有在这些转换是可能的情况下才能转换为数值。否则,数据将丢失并在执行时被编译器强制为缺失值或 NA 值。

这种方法描述了由于插入缺失值或 NA 值代替字符而导致的数据丢失。引入这些 NA 值是因为不能直接进行相互转换。

电阻

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = letters[2:5], 
                col4 = 97:100, stringsAsFactors = FALSE)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column
# to numeric 
data_frame_col1 <- transform(
  data_frame,col1 = as.numeric(col1))
  
print("Modified col1 DataFrame")
print (data_frame_col1)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col1, class)
  
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(col3))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)

输出:

[1] "Original DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
"character"    "factor" "character"   "integer"
[1] "Modified col1 DataFrame"
 col1 col2 col3 col4
1    6    4    b   97
2    7    5    c   98
3    8    6    d   99
4    9    7    e  100
      col1        col2        col3        col4
 "numeric"    "factor" "character"   "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4   NA   97
2    7    5   NA   98
3    8    6   NA   99
4    9    7   NA  100
      col1        col2        col3        col4
"character"    "factor"   "numeric"   "integer"
Warning message:
In eval(substitute(list(...)), `_data`, parent.frame()) :
 NAs introduced by coercion

说明:使用 sapply() 方法,数据帧的 col3 的类是字符,即它由单字节字符值组成,但在应用 transform() 方法时,这些字符值被转换为缺失值或 NA 值,因为字符不能直接转换为数字数据。因此,这会导致数据丢失。



可以通过不使用 stringAsFactors=FALSE 进行转换,然后首先使用 as.factor() 将字符隐式转换为因子,然后使用 as.numeric() 隐式转换为数字数据类型。即使在这种情况下,有关实际字符串的信息也会完全丢失。但是,数据变得不明确,并可能导致实际数据丢失。根据列值的字典排序结果简单地为数据分配数值。

电阻

# declare a dataframe
# different data type have been 
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100)
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col3 <- transform(
  data_frame,col3 = as.numeric(as.factor(col3)))
  
print("Modified col3 DataFrame")
print (data_frame_col3)
  
# indicating the data type of each
# variable 
sapply(data_frame_col3, class)

输出:

[1] "Original DataFrame"
 col1 col2  col3 col4
1    6    4 Geeks   97
2    7    5   For   98
3    8    6 Geeks   99
4    9    7 Gooks  100
    col1      col2      col3      col4
"factor"  "factor"  "factor" "integer"
[1] "Modified col3 DataFrame"
 col1 col2 col3 col4
1    6    4    2   97
2    7    5    1   98
3    8    6    2   99
4    9    7    3  100
    col1      col2      col3      col4
"factor"  "factor" "numeric" "integer" 

说明: col3 中的第一个和第三个字符串相同,因此分配了相同的数值。总的来说,这些值按升序排序,然后分配相应的整数值。 “for”是字典序中出现的最小字符串,因此,赋值为1,然后“Geeks”,其两个实例都映射到2,“Gooks”被赋值为3。因此,col3类型更改为数字。

示例 3:将逻辑类型列转换为数字

布尔值真值被赋予一个相当于 2 的数值,假布尔值被赋予一个数值 1。转换可以很容易地进行,同时保持数据的一致性。

为了保留数据,首先使用 as.factor 将包含这些逻辑值的列转换为因子类型值,然后使用 as.numeric() 为这些值分配一个数值,它只是为这两个值分配整数标识符.

电阻

# declare a dataframe
# different data type have been
# indicated for different cols
data_frame <- data.frame(
                col1 = as.character(6:9), 
                col2 = factor(4:7), 
                col3 = c("Geeks","For","Geeks","Gooks"), 
                col4 = 97:100,
                col5 = c(TRUE,FALSE,TRUE,FALSE))
  
print("Original DataFrame")
print (data_frame)
  
# indicating the data type of each 
# variable 
sapply(data_frame, class)
  
# converting character type column 
# to numeric 
data_frame_col5 <- transform(
  data_frame,col5 = as.numeric(as.factor(col5)))
print("Modified col5 DataFrame")
print (data_frame_col5)
  
# indicating the data type of each 
# variable 
sapply(data_frame_col5, class)

输出:

[1] "Original DataFrame"
 col1 col2  col3 col4  col5
1    6    4 Geeks   97  TRUE
2    7    5   For   98 FALSE
3    8    6 Geeks   99  TRUE
4    9    7 Gooks  100 FALSE
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "logical"
[1] "Modified col5 DataFrame"
 col1 col2  col3 col4 col5
1    6    4 Geeks   97    2
2    7    5   For   98    1
3    8    6 Geeks   99    2
4    9    7 Gooks  100    1
    col1      col2      col3      col4      col5
"factor"  "factor"  "factor" "integer" "numeric" 

说明:使用 sapply() 方法,dataframe 的 col5 的类是逻辑的,即它由 TRUE 和 FALSE 布尔值组成,但是在 transform() 方法的应用中,这些逻辑值被映射为整数,并且col5 的类被转换为数字。