📅  最后修改于: 2020-12-21 01:40:48             🧑  作者: Mango
Java提供java.util.regex包,用于与正则表达式进行模式匹配。 Java正则表达式与Perl编程语言非常相似,并且非常易于学习。
正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。它们可用于搜索,编辑或处理文本和数据。
java.util.regex软件包主要由以下三个类组成-
模式类-模式对象是正则表达式的编译表示。 Pattern类不提供公共构造函数。要创建模式,必须首先调用其公共静态compile()方法之一,然后再返回一个Pattern对象。这些方法接受正则表达式作为第一个参数。
Matcher类-Matcher对象是解释模式并针对输入字符串执行匹配操作的引擎。与Pattern类一样,Matcher也没有定义公共构造函数。您可以通过在Pattern对象上调用matcher()方法来获得Matcher对象。
PatternSyntaxException -PatternSyntaxException对象是未经检查的异常,表示正则表达式模式中的语法错误。
捕获组是一种将多个字符视为一个单元的方法。通过将要分组的字符放在一组括号内来创建它们。例如,正则表达式(狗)创建一个包含字母“ d”,“ o”和“ g”的单个组。
捕获组通过从左到右计数其开括号来编号。在表达式((A)(B(C)))中,例如有四个这样的组-
若要查找表达式中存在多少个组,请在匹配器对象上调用groupCount方法。 groupCount方法返回一个整数,该整数表示匹配器模式中存在的捕获组数。
还有一个特殊的组,组0,它始终代表整个表达式。该组不包括在groupCount报告的总数中。
例
下面的例子说明如何找到从给定的字母数字字符串,数字字符串-
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "This order was placed for QT3000! OK?";
String pattern = "(.*)(\\d+)(.*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
}else {
System.out.println("NO MATCH");
}
}
}
这将产生以下结果-
输出
Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT300
Found value: 0
下表列出了Java中所有可用的正则表达式元字符语法-
Subexpression | Matches |
---|---|
^ | Matches the beginning of the line. |
$ | Matches the end of the line. |
. | Matches any single character except newline. Using m option allows it to match the newline as well. |
[…] | Matches any single character in brackets. |
[^…] | Matches any single character not in brackets. |
\A | Beginning of the entire string. |
\z | End of the entire string. |
\Z | End of the entire string except allowable final line terminator. |
re* | Matches 0 or more occurrences of the preceding expression. |
re+ | Matches 1 or more of the previous thing. |
re? | Matches 0 or 1 occurrence of the preceding expression. |
re{ n} | Matches exactly n number of occurrences of the preceding expression. |
re{ n,} | Matches n or more occurrences of the preceding expression. |
re{ n, m} | Matches at least n and at most m occurrences of the preceding expression. |
a| b | Matches either a or b. |
(re) | Groups regular expressions and remembers the matched text. |
(?: re) | Groups regular expressions without remembering the matched text. |
(?> re) | Matches the independent pattern without backtracking. |
\w | Matches the word characters. |
\W | Matches the nonword characters. |
\s | Matches the whitespace. Equivalent to [\t\n\r\f]. |
\S | Matches the nonwhitespace. |
\d | Matches the digits. Equivalent to [0-9]. |
\D | Matches the nondigits. |
\A | Matches the beginning of the string. |
\Z | Matches the end of the string. If a newline exists, it matches just before newline. |
\z | Matches the end of the string. |
\G | Matches the point where the last match finished. |
\n | Back-reference to capture group number “n”. |
\b | Matches the word boundaries when outside the brackets. Matches the backspace (0x08) when inside the brackets. |
\B | Matches the nonword boundaries. |
\n, \t, etc. | Matches newlines, carriage returns, tabs, etc. |
\Q | Escape (quote) all characters up to \E. |
\E | Ends quoting begun with \Q. |
这是有用的实例方法的列表-
索引方法提供有用的索引值,这些值精确显示在输入字符串找到匹配项的位置-
Sr.No. | Method & Description |
---|---|
1 |
public int start() Returns the start index of the previous match. |
2 |
public int start(int group) Returns the start index of the subsequence captured by the given group during the previous match operation. |
3 |
public int end() Returns the offset after the last character matched. |
4 |
public int end(int group) Returns the offset after the last character of the subsequence captured by the given group during the previous match operation. |
研究方法检查输入字符串并返回一个布尔值,指示是否找到该模式-
Sr.No. | Method & Description |
---|---|
1 |
public boolean lookingAt() Attempts to match the input sequence, starting at the beginning of the region, against the pattern. |
2 |
public boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. |
3 |
public boolean find(int start) Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index. |
4 |
public boolean matches() Attempts to match the entire region against the pattern. |
替换方法是用于替换输入字符串本的有用方法-
Sr.No. | Method & Description |
---|---|
1 |
public Matcher appendReplacement(StringBuffer sb, String replacement) Implements a non-terminal append-and-replace step. |
2 |
public StringBuffer appendTail(StringBuffer sb) Implements a terminal append-and-replace step. |
3 |
public String replaceAll(String replacement) Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. |
4 |
public String replaceFirst(String replacement) Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. |
5 |
public static String quoteReplacement(String s) Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class. |
以下是计算单词“ cat”在输入字符串出现的次数的示例-
例
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "\\bcat\\b";
private static final String INPUT = "cat cat cat cattie cat";
public static void main( String args[] ) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
int count = 0;
while(m.find()) {
count++;
System.out.println("Match number "+count);
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
这将产生以下结果-
输出
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
您可以看到该示例使用单词边界来确保字母“ c”,“ a”,“ t”不仅是较长单词中的子字符串。它还提供了一些有用的信息,说明匹配在输入字符串的何处发生。
start方法返回在上一次匹配操作期间给定组捕获的子序列的起始索引,而end返回最后匹配的字符的索引加一个。
match和lookingAt方法都尝试将输入序列与模式进行匹配。但是,区别在于匹配要求整个输入序列都必须匹配,而lookAt则不需要。
这两种方法总是从输入字符串的开头开始。这是解释功能的示例-
例
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static final String REGEX = "foo";
private static final String INPUT = "fooooooooooooooooo";
private static Pattern pattern;
private static Matcher matcher;
public static void main( String args[] ) {
pattern = Pattern.compile(REGEX);
matcher = pattern.matcher(INPUT);
System.out.println("Current REGEX is: "+REGEX);
System.out.println("Current INPUT is: "+INPUT);
System.out.println("lookingAt(): "+matcher.lookingAt());
System.out.println("matches(): "+matcher.matches());
}
}
这将产生以下结果-
输出
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false
replaceFirst和replaceAll方法替换与给定正则表达式匹配的文本。顾名思义,replaceFirst将替换第一个匹配项,而replaceAll将替换所有匹配项。
这是解释功能的示例-
例
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "dog";
private static String INPUT = "The dog says meow. " + "All dogs say meow.";
private static String REPLACE = "cat";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
INPUT = m.replaceAll(REPLACE);
System.out.println(INPUT);
}
}
这将产生以下结果-
输出
The cat says meow. All cats say meow.
Matcher类还提供了appendReplacement和appendTail方法用于文本替换。
这是解释功能的示例-
例
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
private static String REGEX = "a*b";
private static String INPUT = "aabfooaabfooabfoob";
private static String REPLACE = "-";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
// get a matcher object
Matcher m = p.matcher(INPUT);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, REPLACE);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
这将产生以下结果-
输出
-foo-foo-foo-
PatternSyntaxException是未经检查的异常,它指示正则表达式模式中的语法错误。 PatternSyntaxException类提供以下方法来帮助您确定出了什么问题-
Sr.No. | Method & Description |
---|---|
1 |
public String getDescription() Retrieves the description of the error. |
2 |
public int getIndex() Retrieves the error index. |
3 |
public String getPattern() Retrieves the erroneous regular expression pattern. |
4 |
public String getMessage() Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern. |