Java中的正则表达式边界匹配器
先决条件 - Java中的正则表达式
边界匹配可以帮助我们找到字符串匹配发生的位置。您可以通过使用边界匹配器指定此类信息来使您的模式匹配更加精确。例如,也许您对查找特定单词感兴趣,但前提是它出现在一行的开头或结尾。或者您可能想知道匹配是发生在单词边界上,还是发生在上一个匹配的末尾。
边界匹配器列表
- ^ –放在要匹配的单词之前
- $ -放置在要匹配的单词的末尾
- \b -检查模式是在单词边界上开始还是结束
- \B –匹配非单词边界上的表达式
- \A -输入的开始
- \G -要求匹配仅在上一个匹配结束时发生
- \Z -输入的结尾,但对于最终终止符,如果有的话
- \z -输入的结尾
案例 1:用 ^ 和 $ 匹配单词
- ^ – 匹配一行的开头
- $ - 匹配结尾。
Input : txt = "geeksforgeeks", regex = "^geeks" Output : Found from index 0 to 3 Explanation : Note that the result doesn't include "geeks" after "for" as we have used ^ in regex.
Input : txt = "geeksforgeeks", regex = "geeks$" Output : Found from index 8 to 13. Explanation : Note that the result doesn't include "geeks" before "for" as we have used $ in regex.
Input : txt = "geeksforgeeks", regex = "^geeks$" Output : No match found Explanation : The given regex would only matches with "geeks".
Input : txt = " geeksforgeeks", regex = "^geeks" Output: No match found. Explanation : The input string contains extra whitespace at the beginning.
// Extra \ is used to escape one \ Input : txt = " geeksforgeeks", regex : "^\\s+geeks" Output: Found from index 0 to 6. Explanation : The pattern specifies geeks after one or more spaces.
// Java program to demonstrate that ^ matches the beginning of
// a line, and $ matches the end.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Reg
{
public static void main(String[] args)
{
String txt = "geeksforgeeks";
// Demonstrating ^
String regex1 = "^geeks";
Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
Matcher matcher1 = pattern1.matcher(txt);
while (matcher1.find())
{
System.out.println("Start index: " + matcher1.start());
System.out.println("End index: " + matcher1.end());
}
// Demonstrating $
String regex2 = "geeks$";
Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE);
Matcher matcher2 = pattern2.matcher(txt);
while (matcher2.find())
{
System.out.println("\nStart index: " + matcher2.start());
System.out.println("End index: " + matcher2.end());
}
}
}
输出:
Start index: 0
End index: 5
Start index: 8
End index: 13
案例 2:使用 \b 检查模式是在单词边界上开始还是结束
Input: txt = "geeksforgeeks geekspractice", pat = "\\bgeeks" Output: Found from index 0 to 5 and from index 14 to 19 Explanation : The pattern "geeks" is present at the beginning of two words "geeksforgeeks" and "geekspractice"
Input: txt = "geeksforgeeks geekspractice", pat = "geeks\\b" Output: Found from index 8 to 13 Explanation : The pattern "geeks" is present at the end of one word "geeksforgeeks"
// Java program to demonstrate use of \b to match
// regex at beginning and end of word boundary
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Reg
{
public static void main(String[] args)
{
String txt = "geeksforgeeks geekspractice";
// Demonstrating beginning of word boundary
String regex1 = "\\bgeeks"; // Matched at two places
Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
Matcher matcher1 = pattern1.matcher(txt);
while (matcher1.find())
{
System.out.println("Start index: " + matcher1.start());
System.out.println("End index: " + matcher1.end());
}
// Demonstrating end of word boundary
String regex2 = "geeks\\b"; // Matched at one place
Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE);
Matcher matcher2 = pattern2.matcher(txt);
while (matcher2.find())
{
System.out.println("\nStart index: " + matcher2.start());
System.out.println("End index: " + matcher2.end());
}
}
}
输出:
Start index: 0
End index: 5
Start index: 14
End index: 19
Start index: 8
End index: 13
案例 3:匹配非单词边界上的表达式,改用 \B
Input: txt = "geeksforgeeks geekspractice", pat = "\\Bgeeks" Output: Found from index 8 to 13 Explanation : One occurrence of pattern "geeks" is not present at the beginning of word which is end of "geeksforgeeks"
Input: txt = "geeksforgeeks geekspractice", pat = "geeks\\B" Output: Found from index 0 to 5 and from index 14 to 19 Explanation : Two occurrences of "geeks" are not present at the end of word.
// Java program to demonstrate use of \B to match
// regex at beginning and end of non word boundary
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Reg
{
public static void main(String[] args)
{
String txt = "geeksforgeeks geekspractice";
// Demonstrating Not beginning of word
String regex1 = "\\Bgeeks"; // Matches with two
Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
Matcher matcher1 = pattern1.matcher(txt);
while (matcher1.find())
{
System.out.println("Start index: " + matcher1.start());
System.out.println("End index: " + matcher1.end() + "\n");
}
// Demonstrating Not end of word
String regex2 = "geeks\\B"; // Matches with one
Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE);
Matcher matcher2 = pattern2.matcher(txt);
while (matcher2.find())
{
System.out.println("Start index: " + matcher2.start());
System.out.println("End index: " + matcher2.end());
}
}
}
输出:
Start index: 8
End index: 13
Start index: 0
End index: 5
Start index: 14
End index: 19
情况 4:匹配只发生在前一个匹配的末尾,使用 \G:
Input: txt = "geeksgeeks geeks", pat = "\\Ggeeks" Output: Found from index 0 to 5 and from 5 to 10 Explanation : Only first two occurrences of "geeks" in text match. the occurrence after space doesn't match as it is not just after previous match.
// Java program to demonstrate use of \G to match
// to occur only at the end of the previous match
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Reg
{
public static void main(String[] args)
{
String txt = "geeksgeeks geeks";
// Demonstrating \G
String regex1 = "\\Ggeeks"; // Matches with first two geeks
Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
Matcher matcher1 = pattern1.matcher(txt);
while (matcher1.find())
{
System.out.println("Start index: " + matcher1.start());
System.out.println("End index: " + matcher1.end());
}
}
}
输出:
Start index: 0
End index: 5
Start index: 5
End index: 10
参考资料: https://docs.oracle.com/javase/tutorial/essential/regex/bounds.html