📜  Scala-正则表达式

📅  最后修改于: 2020-11-02 04:48:56             🧑  作者: Mango


本章说明Scala如何通过scala.util.matching包中提供的Regex类支持正则表达式。

尝试下面的示例程序,我们将尝试从语句中找出单词Scala

import scala.util.matching.Regex

object Demo {
   def main(args: Array[String]) {
      val pattern = "Scala".r
      val str = "Scala is Scalable and cool"
      
      println(pattern findFirstIn str)
   }
}

将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。

命令

\>scalac Demo.scala
\>scala Demo

输出

Some(Scala)

我们创建一个String并在其上调用r()方法。 Scala隐式地将String转换为RichString并调用该方法以获取Regex的实例。要查找正则表达式的第一个匹配项,只需调用findFirstIn()方法。如果我们想查找匹配词的所有匹配项,而不是仅查找第一个匹配项,可以使用findAllIn()方法,并且如果目标字符串有多个Scala单词可用,则将返回所有匹配项的集合话。

您可以使用mkString()方法来连接结果列表,并且可以使用管道(|)搜索Scala的小写大写字母,还可以使用Regex构造函数或r()方法来创建模式。

请尝试以下示例程序。

import scala.util.matching.Regex

object Demo {
   def main(args: Array[String]) {
      val pattern = new Regex("(S|s)cala")
      val str = "Scala is scalable and cool"
      
      println((pattern findAllIn str).mkString(","))
   }
}

将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。

命令

\>scalac Demo.scala
\>scala Demo

输出

Scala,scala

如果您想替换匹配的文本,我们可以使用replaceFirstIn()替换第一个匹配项,或者使用replaceAllIn()替换所有匹配项。

object Demo {
   def main(args: Array[String]) {
      val pattern = "(S|s)cala".r
      val str = "Scala is scalable and cool"
      
      println(pattern replaceFirstIn(str, "Java"))
   }
}

将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。

命令

\>scalac Demo.scala
\>scala Demo

输出

Java is scalable and cool

形成正则表达式

Scala从Java继承了其正则表达式语法,而Java又继承了Perl的大多数功能。这只是一些例子,足以作为复习者-

下表列出了Java中可用的所有正则表达式元字符语法。

Subexpression Matches
^ Matches beginning of line.
$ Matches end of line.
. Matches any single character except newline. Using m option allows it to match newline as well.
[…] Matches any single character in brackets.
[^…] Matches any single character not in brackets
\\A Beginning of entire string
\\z End of entire string
\\Z End of entire string except allowable final line terminator.
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more of the previous thing
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a|b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
(?: re) Groups regular expressions without remembering matched text.
(?> re) Matches independent pattern without backtracking.
\\w Matches word characters.
\\W Matches nonword characters.
\\s Matches whitespace. Equivalent to [\t\n\r\f].
\\S Matches nonwhitespace.
\\d Matches digits. Equivalent to [0-9].
\\D Matches nondigits.
\\A Matches beginning of string.
\\Z Matches end of string. If a newline exists, it matches just before newline.
\\z Matches end of string.
\\G Matches point where last match finished.
\\n Back-reference to capture group number “n”
\\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\\B Matches nonword boundaries.
\\n, \\t, etc. Matches newlines, carriage returns, tabs, etc.
\\Q Escape (quote) all characters up to \\E
\\E Ends quoting begun with \\Q

正则表达式示例

Example Description
. Match any character except newline
[Rr]uby Match “Ruby” or “ruby”
rub[ye] Match “ruby” or “rube”
[aeiou] Match any one lowercase vowel
[0-9] Match any digit; same as [0123456789]
[a-z] Match any lowercase ASCII letter
[A-Z] Match any uppercase ASCII letter
[a-zA-Z0-9] Match any of the above
[^aeiou] Match anything other than a lowercase vowel
[^0-9] Match anything other than a digit
\\d Match a digit: [0-9]
\\D Match a nondigit: [^0-9]
\\s Match a whitespace character: [ \t\r\n\f]
\\S Match nonwhitespace: [^ \t\r\n\f]
\\w Match a single word character: [A-Za-z0-9_]
\\W Match a nonword character: [^A-Za-z0-9_]
ruby? Match “rub” or “ruby”: the y is optional
ruby* Match “rub” plus 0 or more ys
ruby+ Match “rub” plus 1 or more ys
\\d{3} Match exactly 3 digits
\\d{3,} Match 3 or more digits
\\d{3,5} Match 3, 4, or 5 digits
\\D\\d+ No group: + repeats \\d
(\\D\\d)+/ Grouped: + repeats \\D\d pair
([Rr]uby(, )?)+ Match “Ruby”, “Ruby, ruby, ruby”, etc.

注意-每个反斜杠在上面的字符串出现两次。这是因为在Java和Scala单反斜线在一个转义字符,而不是一个普通字符显示出来了的字符串中。因此,您需要写’\\’而不是’\’来在字符串得到一个反斜杠。

请尝试以下示例程序。

import scala.util.matching.Regex

object Demo {
   def main(args: Array[String]) {
      val pattern = new Regex("abl[ae]\\d+")
      val str = "ablaw is able1 and cool"
      
      println((pattern findAllIn str).mkString(","))
   }
}

将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。

命令

\>scalac Demo.scala
\>scala Demo

输出

able1