📅  最后修改于: 2020-12-23 05:23:21             🧑  作者: Mango
正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。正则表达式在UNIX世界中被广泛使用。
Python的模块重新提供全面支持类似Perl在Python正则表达式。如果在编译或使用正则表达式时发生错误,则re模块会引发异常re.error。
我们将介绍两个重要的函数,这些函数将用于处理正则表达式。但首先要注意的是:有多种字符,在正则表达式中使用它们会具有特殊含义。为了避免在处理正则表达式时造成混淆,我们将原始字符串用作r’expression’ 。
该函数尝试将RE模式与带有可选标志的字符串匹配。
这是此函数的语法-
re.match(pattern, string, flags=0)
这是参数的描述-
Sr.No. | Parameter & Description |
---|---|
1 |
pattern This is the regular expression to be matched. |
2 |
string This is the string, which would be searched to match the pattern at the beginning of string. |
3 |
flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below. |
re.match函数成功返回匹配对象,失败则返回None 。我们使用match对象的group(num)或groups()函数来获取匹配的表达式。
Sr.No. | Match Object Method & Description |
---|---|
1 |
group(num=0) This method returns entire match (or specific subgroup num) |
2 |
groups() This method returns all matching subgroups in a tuple (empty if there weren’t any) |
#!/usr/bin/python
import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj:
print "matchObj.group() : ", matchObj.group()
print "matchObj.group(1) : ", matchObj.group(1)
print "matchObj.group(2) : ", matchObj.group(2)
else:
print "No match!!"
执行以上代码后,将产生以下结果-
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
此函数使用可选标志搜索字符串中RE模式的首次出现。
这是此函数的语法-
re.search(pattern, string, flags=0)
这是参数的描述-
Sr.No. | Parameter & Description |
---|---|
1 |
pattern This is the regular expression to be matched. |
2 |
string This is the string, which would be searched to match the pattern anywhere in the string. |
3 |
flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below. |
re.search函数返回成功匹配的对象,没有失败。我们使用match对象的group(num)或groups()函数来获取匹配的表达式。
Sr.No. | Match Object Methods & Description |
---|---|
1 |
group(num=0) This method returns entire match (or specific subgroup num) |
2 |
groups() This method returns all matching subgroups in a tuple (empty if there weren’t any) |
#!/usr/bin/python
import re
line = "Cats are smarter than dogs";
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
if searchObj:
print "searchObj.group() : ", searchObj.group()
print "searchObj.group(1) : ", searchObj.group(1)
print "searchObj.group(2) : ", searchObj.group(2)
else:
print "Nothing found!!"
执行以上代码后,将产生以下结果-
searchObj.group() : Cats are smarter than dogs
searchObj.group(1) : Cats
searchObj.group(2) : smarter
Python基于正则表达式提供了两种不同的基本操作: match仅在字符串的开头检查匹配,而search在字符串中的任何位置检查匹配(这是Perl的默认设置)。
#!/usr/bin/python
import re
line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
print "match --> matchObj.group() : ", matchObj.group()
else:
print "No match!!"
searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
print "search --> searchObj.group() : ", searchObj.group()
else:
print "Nothing found!!"
执行以上代码后,将产生以下结果-
No match!!
search --> searchObj.group() : dogs
sub是使用正则表达式的最重要的re方法之一。
re.sub(pattern, repl, string, max=0)
此方法用repl替换字符串所有出现的RE模式,除非提供了max ,否则将替换所有出现的RE模式。此方法返回修改后的字符串。
#!/usr/bin/python
import re
phone = "2004-959-559 # This is Phone Number"
# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num
# Remove anything other than digits
num = re.sub(r'\D', "", phone)
print "Phone Num : ", num
执行以上代码后,将产生以下结果-
Phone Num : 2004-959-559
Phone Num : 2004959559
正则表达式字面量可以包括可选的修饰符,以控制匹配的各个方面。修饰符被指定为可选标志。您可以使用异或(|)提供多个修饰符,如前所示,并且可以用以下任意一种表示-
Sr.No. | Modifier & Description |
---|---|
1 |
re.I Performs case-insensitive matching. |
2 |
re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior(\b and \B). |
3 |
re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). |
4 |
re.S Makes a period (dot) match any character, including a newline. |
5 |
re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B. |
6 |
re.X Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker. |
除控制字符(+?。* ^ $()[] {} | \)外,所有字符匹配。您可以在控制字符前面加上反斜杠来对其进行转义。
下表列出了Python可用的正则表达式语法-
Sr.No. | Pattern & Description |
---|---|
1 |
^ Matches beginning of line. |
2 |
$ Matches end of line. |
3 |
. Matches any single character except newline. Using m option allows it to match newline as well. |
4 |
[…] Matches any single character in brackets. |
5 |
[^…] Matches any single character not in brackets |
6 |
re* Matches 0 or more occurrences of preceding expression. |
7 |
re+ Matches 1 or more occurrence of preceding expression. |
8 |
re? Matches 0 or 1 occurrence of preceding expression. |
9 |
re{ n} Matches exactly n number of occurrences of preceding expression. |
10 |
re{ n,} Matches n or more occurrences of preceding expression. |
11 |
re{ n, m} Matches at least n and at most m occurrences of preceding expression. |
12 |
a| b Matches either a or b. |
13 |
(re) Groups regular expressions and remembers matched text. |
14 |
(?imx) Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
15 |
(?-imx) Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected. |
16 |
(?: re) Groups regular expressions without remembering matched text. |
17 |
(?imx: re) Temporarily toggles on i, m, or x options within parentheses. |
18 |
(?-imx: re) Temporarily toggles off i, m, or x options within parentheses. |
19 |
(?#…) Comment. |
20 |
(?= re) Specifies position using a pattern. Doesn’t have a range. |
21 |
(?! re) Specifies position using pattern negation. Doesn’t have a range. |
22 |
(?> re) Matches independent pattern without backtracking. |
23 |
\w Matches word characters. |
24 |
\W Matches nonword characters. |
25 |
\s Matches whitespace. Equivalent to [\t\n\r\f]. |
26 |
\S Matches nonwhitespace. |
27 |
\d Matches digits. Equivalent to [0-9]. |
28 |
\D Matches nondigits. |
29 |
\A Matches beginning of string. |
30 |
\Z Matches end of string. If a newline exists, it matches just before newline. |
31 |
\z Matches end of string. |
32 |
\G Matches point where last match finished. |
33 |
\b Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets. |
34 |
\B Matches nonword boundaries. |
35 |
\n, \t, etc. Matches newlines, carriage returns, tabs, etc. |
36 |
\1…\9 Matches nth grouped subexpression. |
37 |
\10 Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code. |
Sr.No. | Example & Description |
---|---|
1 |
python Match “python”. |
Sr.No. | Example & Description |
---|---|
1 |
[Pp]ython Match “Python” or “python” |
2 |
rub[ye] Match “ruby” or “rube” |
3 |
[aeiou] Match any one lowercase vowel |
4 |
[0-9] Match any digit; same as [0123456789] |
5 |
[a-z] Match any lowercase ASCII letter |
6 |
[A-Z] Match any uppercase ASCII letter |
7 |
[a-zA-Z0-9] Match any of the above |
8 |
[^aeiou] Match anything other than a lowercase vowel |
9 |
[^0-9] Match anything other than a digit |
Sr.No. | Example & Description |
---|---|
1 |
. Match any character except newline |
2 |
\d Match a digit: [0-9] |
3 |
\D Match a nondigit: [^0-9] |
4 |
\s Match a whitespace character: [ \t\r\n\f] |
5 |
\S Match nonwhitespace: [^ \t\r\n\f] |
6 |
\w Match a single word character: [A-Za-z0-9_] |
7 |
\W Match a nonword character: [^A-Za-z0-9_] |
Sr.No. | Example & Description |
---|---|
1 |
ruby? Match “rub” or “ruby”: the y is optional |
2 |
ruby* Match “rub” plus 0 or more ys |
3 |
ruby+ Match “rub” plus 1 or more ys |
4 |
\d{3} Match exactly 3 digits |
5 |
\d{3,} Match 3 or more digits |
6 |
\d{3,5} Match 3, 4, or 5 digits |
这匹配最小的重复次数-
Sr.No. | Example & Description |
---|---|
1 |
<.*> Greedy repetition: matches “ |
2 |
<.*?> Nongreedy: matches “ |
Sr.No. | Example & Description |
---|---|
1 |
\D\d+ No group: + repeats \d |
2 |
(\D\d)+ Grouped: + repeats \D\d pair |
3 |
([Pp]ython(, )?)+ Match “Python”, “Python, python, python”, etc. |
这再次匹配先前匹配的组-
Sr.No. | Example & Description |
---|---|
1 |
([Pp])ython&\1ails Match python&pails or Python&Pails |
2 |
([‘”])[^\1]*\1 Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc. |
Sr.No. | Example & Description |
---|---|
1 |
python|perl Match “python” or “perl” |
2 |
rub(y|le)) Match “ruby” or “ruble” |
3 |
Python(!+|\?) “Python” followed by one or more ! or one ? |
这需要指定匹配位置。
Sr.No. | Example & Description |
---|---|
1 |
^Python Match “Python” at the start of a string or internal line |
2 |
Python$ Match “Python” at the end of a string or line |
3 |
\APython Match “Python” at the start of a string |
4 |
Python\Z Match “Python” at the end of a string |
5 |
\bPython\b Match “Python” at a word boundary |
6 |
\brub\B \B is nonword boundary: match “rub” in “rube” and “ruby” but not alone |
7 |
Python(?=!) Match “Python”, if followed by an exclamation point. |
8 |
Python(?!!) Match “Python”, if not followed by an exclamation point. |
Sr.No. | Example & Description |
---|---|
1 |
R(?#comment) Matches “R”. All the rest is a comment |
2 |
R(?i)uby Case-insensitive while matching “uby” |
3 |
R(?i:uby) Same as above |
4 |
rub(?:y|le)) Group only without creating \1 backreference |