📅  最后修改于: 2020-10-16 05:36:37             🧑  作者: Mango
正则表达式是字符的字符串,它定义了正在查看的图案或图案。 Perl中正则表达式的语法与其他正则表达式支持程序(例如sed , grep和awk )非常相似。
应用正则表达式的基本方法是使用模式绑定运算符=〜和! 〜。第一个运算符是测试和赋值运算符。
Perl中有三个正则表达式运算符。
在每种情况下,正斜杠都是您指定的正则表达式(regex)的分隔符。如果您对其他定界符感到满意,则可以使用正斜杠代替。
匹配运算符m //用于将字符串或语句与正则表达式匹配。例如,要将字符序列“ foo”与标量$ bar匹配,可以使用如下语句:
#!/usr/bin/perl
$bar = "This is foo and again foo";
if ($bar =~ /foo/) {
print "First time is matching\n";
} else {
print "First time is not matching\n";
}
$bar = "foo";
if ($bar =~ /foo/) {
print "Second time is matching\n";
} else {
print "Second time is not matching\n";
}
当执行上述程序时,将产生以下结果-
First time is matching
Second time is matching
m //实际上与q //运算符系列的工作方式相同。您可以使用自然匹配字符的任意组合作为表达式的定界符。例如,m {},m()和m> <均有效。所以上面的例子可以重写如下:
#!/usr/bin/perl
$bar = "This is foo and again foo";
if ($bar =~ m[foo]) {
print "First time is matching\n";
} else {
print "First time is not matching\n";
}
$bar = "foo";
if ($bar =~ m{foo}) {
print "Second time is matching\n";
} else {
print "Second time is not matching\n";
}
如果定界符为正斜杠,则可以从m //中省略m,但是对于所有其他定界符,必须使用m前缀。
请注意,如果整个表达式匹配,则整个match表达式(即=〜或!〜左侧的表达式以及match运算符)将返回true(在标量上下文中)。因此,声明-
$true = ($foo =~ m/foo/);
如果$ foo匹配正则表达式,则将$ true设置为1;如果匹配失败,则将$ true设置为0。在列表上下文中,匹配项返回所有分组表达式的内容。例如,从时间字符串提取小时,分钟和秒时,我们可以使用-
my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);
匹配运算符支持自己的一组修饰符。 / g修饰符允许全局匹配。 / i修饰符将使区分大小写不区分大小写。这是修饰符的完整列表
Sr.No. | Modifier & Description |
---|---|
1 |
i Makes the match case insensitive. |
2 |
m Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary. |
3 |
o Evaluates the expression only once. |
4 |
s Allows use of . to match a newline character. |
5 |
x Allows you to use white space in the expression for clarity. |
6 |
g Globally finds all matches. |
7 |
cg Allows the search to continue even after a global match fails. |
匹配运算符还有一个更简单的版本-?PATTERN?运算符。这与m //运算符基本相同,除了它在每个重置调用之间搜索的字符串内仅匹配一次。
例如,您可以使用它来获取列表中的第一个和最后一个元素-
#!/usr/bin/perl
@list = qw/food foosball subeo footnote terfoot canic footbrdige/;
foreach (@list) {
$first = $1 if /(foo.*?)/;
$last = $1 if /(foo.*)/;
}
print "First: $first, Last: $last\n";
当执行上述程序时,将产生以下结果-
First: foo, Last: footbrdige
正则表达式变量包括$ ,它包含匹配的最后一个分组匹配的内容; $& ,包含整个匹配的字符串; $` ,包含匹配字符串之前的所有内容;和$’ ,其中包含匹配字符串之后的所有内容。以下代码演示了结果-
#!/usr/bin/perl
$string = "The food is in the salad bar";
$string =~ m/foo/;
print "Before: $`\n";
print "Matched: $&\n";
print "After: $'\n";
当执行上述程序时,将产生以下结果-
Before: The
Matched: foo
After: d is in the salad bar
替换运算符s ///实际上只是match运算符的扩展,它使您可以用某些新文本替换匹配的文本。运算符的基本形式是-
s/PATTERN/REPLACEMENT/;
PATTERN是我们要查找的文本的正则表达式。 REPLACEMENT是我们要用来替换找到的文本的文本或正则表达式的规范。例如,我们可以使用下面的正则表达式替换猫犬的所有出现-
#/user/bin/perl
$string = "The cat sat on the mat";
$string =~ s/cat/dog/;
print "$string\n";
当执行上述程序时,将产生以下结果-
The dog sat on the mat
这是与替代运算符使用的所有修饰符的列表。
Sr.No. | Modifier & Description |
---|---|
1 |
i Makes the match case insensitive. |
2 |
m Specifies that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary. |
3 |
o Evaluates the expression only once. |
4 |
s Allows use of . to match a newline character. |
5 |
x Allows you to use white space in the expression for clarity. |
6 |
g Replaces all occurrences of the found expression with the replacement text. |
7 |
e Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text. |
翻译与替换原理相似但不相同,但是与替换不同,翻译(或音译)不使用正则表达式搜索替换值。翻译运算符是-
tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds
翻译替换字符出现的所有SEARCHLIST与REPLACEMENTLIST相应的字符。例如,使用“猫坐在垫子上”。我们在本章中一直使用的字符串-
#/user/bin/perl
$string = 'The cat sat on the mat';
$string =~ tr/a/o/;
print "$string\n";
当执行上述程序时,将产生以下结果-
The cot sot on the mot.
也可以使用标准Perl范围,从而允许您通过字母或数字值指定字符范围。要更改字符串,可以使用以下语法代替uc函数。
$string =~ tr/a-z/A-Z/;
以下是与翻译相关的运算符的列表。
Sr.No. | Modifier & Description |
---|---|
1 |
c Complements SEARCHLIST. |
2 |
d Deletes found but unreplaced characters. |
3 |
s Squashes duplicate replaced characters. |
/ d修饰符删除与SEARCHLIST匹配的,在REPLACEMENTLIST中没有相应条目的字符。例如-
#!/usr/bin/perl
$string = 'the cat sat on the mat.';
$string =~ tr/a-z/b/d;
print "$string\n";
当执行上述程序时,将产生以下结果-
b b b.
最后一个修饰符/ s删除被替换的重复字符序列,因此-
#!/usr/bin/perl
$string = 'food';
$string = 'food';
$string =~ tr/a-z/a-z/s;
print "$string\n";
当执行上述程序时,将产生以下结果-
fod
您不仅需要匹配固定的字符串。实际上,通过使用更复杂的正则表达式,您几乎可以匹配任何您梦dream以求的东西。这是一个快速备忘单-
下表列出了Python可用的正则表达式语法。
Sr.No. | Pattern & Description |
---|---|
1 |
^ Matches beginning of line. |
2 |
$ Matches end of line. |
3 |
. Matches any single character except newline. Using m option allows it to match newline as well. |
4 |
[…] Matches any single character in brackets. |
5 |
[^…] Matches any single character not in brackets. |
6 |
* Matches 0 or more occurrences of preceding expression. |
7 |
+ Matches 1 or more occurrence of preceding expression. |
8 |
? Matches 0 or 1 occurrence of preceding expression. |
9 |
{ n} Matches exactly n number of occurrences of preceding expression. |
10 |
{ n,} Matches n or more occurrences of preceding expression. |
11 |
{ n, m} Matches at least n and at most m occurrences of preceding expression. |
12 |
a| b Matches either a or b. |
13 |
\w Matches word characters. |
14 |
\W Matches nonword characters. |
15 |
\s Matches whitespace. Equivalent to [\t\n\r\f]. |
16 |
\S Matches nonwhitespace. |
17 |
\d Matches digits. Equivalent to [0-9]. |
18 |
\D Matches nondigits. |
19 |
\A Matches beginning of string. |
20 |
\Z Matches end of string. If a newline exists, it matches just before newline. |
21 |
\z Matches end of string. |
22 |
\G Matches point where last match finished. |
23 |
\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
24 |
\B Matches nonword boundaries. |
25 |
\n, \t, etc. Matches newlines, carriage returns, tabs, etc. |
26 |
\1…\9 Matches nth grouped subexpression. |
27 |
\10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
28 |
[aeiou] Matches a single character in the given set |
29 |
[^aeiou] Matches a single character outside the given set |
^元字符匹配字符串的开头,$元符号匹配字符串。这里有一些简短的例子。
# nothing in the string (start and end are adjacent)
/^$/
# a three digits, each followed by a whitespace
# character (eg "3 4 5 ")
/(\d\s) {3}/
# matches a string in which every
# odd-numbered letter is a (eg "abacadaf")
/(a.)+/
# string starts with one or more digits
/^\d+/
# string that ends with one or more digits
/\d+$/
让我们看另一个例子。
#!/usr/bin/perl
$string = "Cats go Catatonic\nWhen given Catnip";
($start) = ($string =~ /\A(.*?) /);
@lines = $string =~ /^(.*?) /gm;
print "First word: $start\n","Line starts: @lines\n";
当执行上述程序时,将产生以下结果-
First word: Cats
Line starts: Cats When
\ b在任何单词边界都匹配,这由\ w类和\ W类之间的差异定义。因为\ w包含单词的字符,而\ W包含相反的字符,这通常意味着单词的终止。 \ B断言匹配不是单词边界的任何位置。例如-
/\bcat\b/ # Matches 'the cat sat' but not 'cat on the mat'
/\Bcat\B/ # Matches 'verification' but not 'the cat on the mat'
/\bcat\B/ # Matches 'catatonic' but not 'polecat'
/\Bcat\b/ # Matches 'polecat' but not 'catatonic'
|字符就像Perl中的标准或按位或。它在正则表达式或组中指定备用匹配项。例如,要在表达式中匹配“ cat”或“ dog”,您可以使用以下代码-
if ($string =~ /cat|dog/)
您可以将表达式的各个元素组合在一起,以支持复杂的匹配。搜索两个人的名字可以通过两个单独的测试来完成,如下所示:
if (($string =~ /Martin Brown/) || ($string =~ /Sharon Brown/))
This could be written as follows
if ($string =~ /(Martin|Sharon) Brown/)
从正则表达式的角度来看,两者之间没有区别,只是前者稍微清晰一点。
$string =~ /(\S+)\s+(\S+)/;
and
$string =~ /\S+\s+\S+/;
但是,分组的好处是它允许我们从正则表达式中提取序列。分组以列表在原始组中出现的顺序作为列表返回。例如,在以下片段中,我们从字符串拉出了小时,分钟和秒。
my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);
除此直接方法外,还可以在特殊的$ x变量中使用匹配的组,其中x是正则表达式中组的编号。因此,我们可以将前面的示例重写如下:
#!/usr/bin/perl
$time = "12:05:30";
$time =~ m/(\d+):(\d+):(\d+)/;
my ($hours, $minutes, $seconds) = ($1, $2, $3);
print "Hours : $hours, Minutes: $minutes, Second: $seconds\n";
当执行上述程序时,将产生以下结果-
Hours : 12, Minutes: 05, Second: 30
在替换表达式中使用组时,可以在替换文本中使用$ x语法。因此,我们可以使用以下命令重新格式化日期字符串-
#!/usr/bin/perl
$date = '03/26/1999';
$date =~ s#(\d+)/(\d+)/(\d+)#$3/$1/$2#;
print "$date\n";
当执行上述程序时,将产生以下结果-
1999/03/26
\ G断言允许您从最后一次匹配的位置继续搜索。例如,在下面的代码中,我们使用\ G,以便我们可以搜索到正确的位置然后提取一些信息,而无需创建更复杂的单个正则表达式-
#!/usr/bin/perl
$string = "The time is: 12:31:02 on 4/12/00";
$string =~ /:\s+/g;
($time) = ($string =~ /\G(\d+:\d+:\d+)/);
$string =~ /.+\s+/g;
($date) = ($string =~ m{\G(\d+/\d+/\d+)});
print "Time: $time, Date: $date\n";
当执行上述程序时,将产生以下结果-
Time: 12:31:02, Date: 4/12/00
\ G断言实际上只是pos函数的元符号等效项,因此在正则表达式调用之间,您可以继续使用pos,甚至可以通过将pos用作左值子例程来修改pos的值(因此也可以修改\ G的值)。
Sr.No. | Example & Description |
---|---|
1 |
Perl Match “Perl”. |
Sr.No. | Example & Description |
---|---|
1 |
[Pp]ython Matches “Python” or “python” |
2 |
rub[ye] Matches “ruby” or “rube” |
3 |
[aeiou] Matches any one lowercase vowel |
4 |
[0-9] Matches any digit; same as [0123456789] |
5 |
[a-z] Matches any lowercase ASCII letter |
6 |
[A-Z] Matches any uppercase ASCII letter |
7 |
[a-zA-Z0-9] Matches any of the above |
8 |
[^aeiou] Matches anything other than a lowercase vowel |
9 |
[^0-9] Matches anything other than a digit |
Sr.No. | Example & Description |
---|---|
1 |
. Matches any character except newline |
2 |
\d Matches a digit: [0-9] |
3 |
\D Matches a nondigit: [^0-9] |
4 |
\s Matches a whitespace character: [ \t\r\n\f] |
5 |
\S Matches nonwhitespace: [^ \t\r\n\f] |
6 |
\w Matches a single word character: [A-Za-z0-9_] |
7 |
\W Matches a nonword character: [^A-Za-z0-9_] |
Sr.No. | Example & Description |
---|---|
1 |
ruby? Matches “rub” or “ruby”: the y is optional |
2 |
ruby* Matches “rub” plus 0 or more ys |
3 |
ruby+ Matches “rub” plus 1 or more ys |
4 |
\d{3} Matches exactly 3 digits |
5 |
\d{3,} Matches 3 or more digits |
6. |
\d{3,5} Matches 3, 4, or 5 digits |
这匹配最小的重复次数-
Sr.No. | Example & Description |
---|---|
1 |
<.*> Greedy repetition: matches “ |
2 |
<.*?> Nongreedy: matches “ |
Sr.No. | Example & Description |
---|---|
1 |
\D\d+ No group: + repeats \d |
2 |
(\D\d)+ Grouped: + repeats \D\d pair |
3 |
([Pp]ython(, )?)+ Match “Python”, “Python, python, python”, etc. |
这再次匹配先前匹配的组-
Sr.No. | Example & Description |
---|---|
1 |
([Pp])ython&\1ails Matches python&pails or Python&Pails |
2 |
([‘”])[^\1]*\1 Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc. |
Sr.No. | Example & Description |
---|---|
1 |
python|perl Matches “python” or “perl” |
2 |
rub(y|le)) Matches “ruby” or “ruble” |
3 |
Python(!+|\?) “Python” followed by one or more ! or one ? |
这需要指定匹配位置。
Sr.No. | Example & Description |
---|---|
1 |
^Python Matches “Python” at the start of a string or internal line |
2 |
Python$ Matches “Python” at the end of a string or line |
3 |
\APython Matches “Python” at the start of a string |
4 |
Python\Z Matches “Python” at the end of a string |
5 |
\bPython\b Matches “Python” at a word boundary |
6 |
\brub\B \B is nonword boundary: match “rub” in “rube” and “ruby” but not alone |
7 |
Python(?=!) Matches “Python”, if followed by an exclamation point |
8 |
Python(?!!) Matches “Python”, if not followed by an exclamation point |
Sr.No. | Example & Description |
---|---|
1 |
R(?#comment) Matches “R”. All the rest is a comment |
2 |
R(?i)uby Case-insensitive while matching “uby” |
3 |
R(?i:uby) Same as above |
4 |
rub(?:y|le)) Group only without creating \1 backreference |