📅  最后修改于: 2020-11-02 04:48:56             🧑  作者: Mango
本章说明Scala如何通过scala.util.matching包中提供的Regex类支持正则表达式。
尝试下面的示例程序,我们将尝试从语句中找出单词Scala 。
import scala.util.matching.Regex
object Demo {
def main(args: Array[String]) {
val pattern = "Scala".r
val str = "Scala is Scalable and cool"
println(pattern findFirstIn str)
}
}
将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。
\>scalac Demo.scala
\>scala Demo
Some(Scala)
我们创建一个String并在其上调用r()方法。 Scala隐式地将String转换为RichString并调用该方法以获取Regex的实例。要查找正则表达式的第一个匹配项,只需调用findFirstIn()方法。如果我们想查找匹配词的所有匹配项,而不是仅查找第一个匹配项,可以使用findAllIn()方法,并且如果目标字符串有多个Scala单词可用,则将返回所有匹配项的集合话。
您可以使用mkString()方法来连接结果列表,并且可以使用管道(|)搜索Scala的小写大写字母,还可以使用Regex构造函数或r()方法来创建模式。
请尝试以下示例程序。
import scala.util.matching.Regex
object Demo {
def main(args: Array[String]) {
val pattern = new Regex("(S|s)cala")
val str = "Scala is scalable and cool"
println((pattern findAllIn str).mkString(","))
}
}
将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。
\>scalac Demo.scala
\>scala Demo
Scala,scala
如果您想替换匹配的文本,我们可以使用replaceFirstIn()替换第一个匹配项,或者使用replaceAllIn()替换所有匹配项。
object Demo {
def main(args: Array[String]) {
val pattern = "(S|s)cala".r
val str = "Scala is scalable and cool"
println(pattern replaceFirstIn(str, "Java"))
}
}
将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。
\>scalac Demo.scala
\>scala Demo
Java is scalable and cool
Scala从Java继承了其正则表达式语法,而Java又继承了Perl的大多数功能。这只是一些例子,足以作为复习者-
下表列出了Java中可用的所有正则表达式元字符语法。
Subexpression | Matches |
---|---|
^ | Matches beginning of line. |
$ | Matches end of line. |
. | Matches any single character except newline. Using m option allows it to match newline as well. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets |
\\A | Beginning of entire string |
\\z | End of entire string |
\\Z | End of entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of preceding expression. |
re+ | Matches 1 or more of the previous thing |
re? | Matches 0 or 1 occurrence of preceding expression. |
re{ n} | Matches exactly n number of occurrences of preceding expression. |
re{ n,} | Matches n or more occurrences of preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of preceding expression. |
a|b | Matches either a or b. |
(re) | Groups regular expressions and remembers matched text. |
(?: re) | Groups regular expressions without remembering matched text. |
(?> re) | Matches independent pattern without backtracking. |
\\w | Matches word characters. |
\\W | Matches nonword characters. |
\\s | Matches whitespace. Equivalent to [\t\n\r\f]. |
\\S | Matches nonwhitespace. |
\\d | Matches digits. Equivalent to [0-9]. |
\\D | Matches nondigits. |
\\A | Matches beginning of string. |
\\Z | Matches end of string. If a newline exists, it matches just before newline. |
\\z | Matches end of string. |
\\G | Matches point where last match finished. |
\\n | Back-reference to capture group number “n” |
\\b | Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
\\B | Matches nonword boundaries. |
\\n, \\t, etc. | Matches newlines, carriage returns, tabs, etc. |
\\Q | Escape (quote) all characters up to \\E |
\\E | Ends quoting begun with \\Q |
Example | Description |
---|---|
. | Match any character except newline |
[Rr]uby | Match “Ruby” or “ruby” |
rub[ye] | Match “ruby” or “rube” |
[aeiou] | Match any one lowercase vowel |
[0-9] | Match any digit; same as [0123456789] |
[a-z] | Match any lowercase ASCII letter |
[A-Z] | Match any uppercase ASCII letter |
[a-zA-Z0-9] | Match any of the above |
[^aeiou] | Match anything other than a lowercase vowel |
[^0-9] | Match anything other than a digit |
\\d | Match a digit: [0-9] |
\\D | Match a nondigit: [^0-9] |
\\s | Match a whitespace character: [ \t\r\n\f] |
\\S | Match nonwhitespace: [^ \t\r\n\f] |
\\w | Match a single word character: [A-Za-z0-9_] |
\\W | Match a nonword character: [^A-Za-z0-9_] |
ruby? | Match “rub” or “ruby”: the y is optional |
ruby* | Match “rub” plus 0 or more ys |
ruby+ | Match “rub” plus 1 or more ys |
\\d{3} | Match exactly 3 digits |
\\d{3,} | Match 3 or more digits |
\\d{3,5} | Match 3, 4, or 5 digits |
\\D\\d+ | No group: + repeats \\d |
(\\D\\d)+/ | Grouped: + repeats \\D\d pair |
([Rr]uby(, )?)+ | Match “Ruby”, “Ruby, ruby, ruby”, etc. |
注意-每个反斜杠在上面的字符串出现两次。这是因为在Java和Scala单反斜线在一个转义字符,而不是一个普通字符显示出来了的字符串中。因此,您需要写’\\’而不是’\’来在字符串得到一个反斜杠。
请尝试以下示例程序。
import scala.util.matching.Regex
object Demo {
def main(args: Array[String]) {
val pattern = new Regex("abl[ae]\\d+")
val str = "ablaw is able1 and cool"
println((pattern findAllIn str).mkString(","))
}
}
将以上程序保存在Demo.scala中。以下命令用于编译和执行该程序。
\>scalac Demo.scala
\>scala Demo
able1