如何计算 R DataFrame 每行中某个字符的出现次数？

在本文中，我们将讨论如何在 R 编程语言中计算 DataFrame 中给定字符在每一行中出现的次数。

方法一：使用stringr包

R 编程语言中的 stringr 包可用于执行字符串操作和提取，可以安装到工作空间中。

str_count() 方法用于返回字符串向量中指定模式的匹配。它返回在输入参数向量中找到的模式实例数的整数向量。 str_count() 方法区分大小写。

Syntax:

str_count(str, pattern = “”)

Parameter :

str – The vector of strings or a single string to search for the pattern
pattern – The pattern to be searched for. Usually a regular expression.

编程需要懂一点英语

模式可以是单个字符或堆叠在一起的一组字符。它甚至可能包含特殊符号或数字。如果找不到模式，则返回整数值 0。

例子：

R

# loading the reqd library
library ("stringr")
  
# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("Geeks","for","geeks","CSE","portal"))
  
# character to search for
ch <- "e"
  
# counting the occurences of character
count <- str_count(data_frame$col2, ch)
print ("Count of e :")
print (count)

R

# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
  
print ("Original DataFrame")
print (data_frame)
  
# character to search for
ch <- "!"
count <- regmatches(
  data_frame$col2, gregexpr(ch, data_frame$col2))
  
print ("Count of !")
  
# returning the number of occurences
lengths(count)

R

# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
  
print ("Original DataFrame")
print (data_frame)
  
# character to search for
ch <- "!"
count <- sapply(as.character(data_frame$col2), 
                function(x, letter = ch){
  str <- strsplit(x, split = "")
  sum(unlist(str) == letter)
})
print ("Count of !")
  
# returning the number of occurences
print(count)

输出

[1] “Count of e :”

[1] 2 0 2 0 0

编程需要懂一点英语

方法二：使用grepexpr方法

基 R 的 gregexpr() 方法用于指示模式在指定字符向量中的位置。它用于返回与输入字符数组的每个组件匹配的起始位置的向量向量。返回向量的长度等于原始字符串向量的长度。

Syntax:

gregexpr(pattern, str, ignore.case=FALSE)

Parameter :

str – The vector of strings or a single string to search for the pattern
pattern – The pattern to be searched for. Usually a regular expression.
ignore.case – Indicator to ignore case or not

编程需要懂一点英语

在这里，模式是要搜索的字符，而 str 是要在其中查找模式的字符串列。 regmatches() 方法应用于此函数的输出，该方法用于提取或替换匹配的子字符串匹配的数据。如果没有找到匹配的子字符串模式，则返回空字符串。

Syntax:

regmatches(str, m)

Parameter :

m – The output vector from the matched data.

编程需要懂一点英语

接下来是应用lengths() 方法，该方法从regmatches() 向量中返回每个子串组件的长度。

例子：

电阻

# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
  
print ("Original DataFrame")
print (data_frame)
  
# character to search for
ch <- "!"
count <- regmatches(
  data_frame$col2, gregexpr(ch, data_frame$col2))
  
print ("Count of !")
  
# returning the number of occurences
lengths(count)

输出

[1] “Original DataFrame”

col1 col2

1 1 !?contains

2 2 do!es!nt

3 3 Contain

4 4 cs!!!e

5 5 circus?

[1] “Count of !”

[1] 1 2 0 3 0

编程需要懂一点英语

方法 3：使用 sapply 方法

R 中的 sapply() 方法用于将用户定义的函数应用于作为第一个参数的指定输入向量。在这种情况下，用户定义的函数由一系列步骤组成：

句法：

sapply ( x , fun)

编程需要懂一点英语

strsplit() 方法用于将输入向量的每个分量根据“ ”分隔符拆分为多个分量。在字符串由多个单词组成的情况下很有用。它返回列的每个元素中的单词数组。
然后将 unlist() 方法应用于字母向量中的每个单词，并检查每个字母是否与我们要搜索的字符等效。然后，每次找到匹配项时，都会应用 sum() 方法来增加计数。

句法：

sum ( unlist( str) == ch)

编程需要懂一点英语

例子：

电阻

# creating a data frame
data_frame <- data.frame(
  col1 = c(1:5), col2 = c("!?contains","do!es!nt",
                          "Contain","cs!!!e","circus?"))
  
print ("Original DataFrame")
print (data_frame)
  
# character to search for
ch <- "!"
count <- sapply(as.character(data_frame$col2), 
                function(x, letter = ch){
  str <- strsplit(x, split = "")
  sum(unlist(str) == letter)
})
print ("Count of !")
  
# returning the number of occurences
print(count)

输出

[1] “Original DataFrame”

col1 col2

1 1 !?contains

2 2 do!es!nt

3 3 Contain

4 4 cs!!!e

5 5 circus?

[1] “Count of !”

!?contains do!es!nt Contain cs!!!e circus?

1 2 0 3 0

编程需要懂一点英语