📅  最后修改于: 2020-12-23 04:59:44             🧑  作者: Mango
正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。正则表达式在UNIX世界中被广泛使用。
re模块提供对Python类似Perl的正则表达式的完全支持。如果在编译或使用正则表达式时发生错误,则re模块会引发异常re.error 。
我们将介绍两个重要的函数,这些函数将用于处理正则表达式。不过,首先要注意的是:有多种字符,在正则表达式中使用它们会具有特殊含义。为了避免在处理正则表达式时造成混淆,我们将原始字符串用作r’expression’ 。
Sr.No. | Expression & Matches |
---|---|
1 |
a, X, 9, < ordinary characters just match themselves exactly. |
2 |
. (a period) matches any single character except newline ‘\n’ |
3 |
\w matches a “word” character: a letter or digit or underbar [a-zA-Z0-9_]. |
4 |
\W matches any non-word character. |
5 |
\b boundary between word and non-word |
6 |
\s matches a single whitespace character — space, newline, return, tab |
7 |
\S matches any non-whitespace character. |
8 |
\t, \n, \r tab, newline, return |
9 |
\d decimal digit [0-9] |
10 |
^ matches start of the string |
11 |
$ match the end of the string |
12 |
\ inhibit the “specialness” of a character. |
编译标志使您可以修改正则表达式工作方式的某些方面。 re模块中的标志有两个名称,一个长名称(如IGNORECASE)和一个短的单字母形式(如I)。
Sr.No. | Flag & Meaning |
---|---|
1 |
ASCII, A Makes several escapes like \w, \b, \s and \d match only on ASCII characters with the respective property. |
2 |
DOTALL, S Make, match any character, including newlines |
3 |
IGNORECASE, I Do case-insensitive matches |
4 |
LOCALE, L Do a locale-aware match |
5 |
MULTILINE, M Multi-line matching, affecting ^ and $ |
6 |
VERBOSE, X (for ‘extended’) Enable verbose REs, which can be organized more cleanly and understandably |
该函数尝试将RE模式与带有可选标志的字符串匹配。
这是此函数的语法-
re.match(pattern, string, flags = 0)
这是参数的描述-
Sr.No. | Parameter & Description |
---|---|
1 |
pattern This is the regular expression to be matched. |
2 |
string This is the string, which would be searched to match the pattern at the beginning of string. |
3 |
flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below. |
re.match函数成功返回匹配对象,失败则返回None 。我们使用match对象的group(num)或groups()函数来获取匹配的表达式。
Sr.No. | Match Object Method & Description |
---|---|
1 |
group(num = 0) This method returns entire match (or specific subgroup num) |
2 |
groups() This method returns all matching subgroups in a tuple (empty if there weren’t any) |
#!/usr/bin/python3
import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj:
print ("matchObj.group() : ", matchObj.group())
print ("matchObj.group(1) : ", matchObj.group(1))
print ("matchObj.group(2) : ", matchObj.group(2))
else:
print ("No match!!")
执行以上代码后,将产生以下结果-
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
此函数使用可选标志搜索字符串中RE模式的首次出现。
这是此函数的语法-
re.search(pattern, string, flags = 0)
这是参数的描述-
Sr.No. | Parameter & Description |
---|---|
1 |
pattern This is the regular expression to be matched. |
2 |
string This is the string, which would be searched to match the pattern anywhere in the string. |
3 |
flags You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below. |
re.search函数返回成功匹配的对象,没有失败。我们使用match对象的group(num)或groups()函数来获取匹配的表达式。
Sr.No. | Match Object Method & Description |
---|---|
1 |
group(num = 0) This method returns entire match (or specific subgroup num) |
2 |
groups() This method returns all matching subgroups in a tuple (empty if there weren’t any) |
#!/usr/bin/python3
import re
line = "Cats are smarter than dogs";
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
if searchObj:
print ("searchObj.group() : ", searchObj.group())
print ("searchObj.group(1) : ", searchObj.group(1))
print ("searchObj.group(2) : ", searchObj.group(2))
else:
print ("Nothing found!!")
执行以上代码后,将产生以下结果-
matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter
Python基于正则表达式提供了两种不同的基本操作: match仅在字符串的开头检查匹配,而search在字符串中的任何位置检查匹配(这是Perl的默认设置)。
#!/usr/bin/python3
import re
line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
print ("match --> matchObj.group() : ", matchObj.group())
else:
print ("No match!!")
searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
print ("search --> searchObj.group() : ", searchObj.group())
else:
print ("Nothing found!!")
执行以上代码后,将产生以下结果-
No match!!
search --> matchObj.group() : dogs
sub是使用正则表达式的最重要的re方法之一。
re.sub(pattern, repl, string, max=0)
此方法用repl替换字符串所有出现的RE模式,除非提供了max ,否则将替换所有出现的RE模式。此方法返回修改后的字符串。
#!/usr/bin/python3
import re
phone = "2004-959-559 # This is Phone Number"
# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print ("Phone Num : ", num)
# Remove anything other than digits
num = re.sub(r'\D', "", phone)
print ("Phone Num : ", num)
执行以上代码后,将产生以下结果-
Phone Num : 2004-959-559
Phone Num : 2004959559
正则表达式字面量可以包括可选的修饰符,以控制匹配的各个方面。修饰符被指定为可选标志。您可以使用异或(|)提供多个修饰符,如前所示,并且可以用以下任意一种表示-
Sr.No. | Modifier & Description |
---|---|
1 |
re.I Performs case-insensitive matching. |
2 |
re.L Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B). |
3 |
re.M Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string). |
4 |
re.S Makes a period (dot) match any character, including a newline. |
5 |
re.U Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B. |
6 |
re.X Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker. |
除了控制字符(&plus;?。* ^&dollar;()[] {} | \) ,其他所有字符匹配。您可以在控制字符前面加上反斜杠来对其进行转义。
下表列出了Python可用的正则表达式语法-
Sr.No. | Example & Description |
---|---|
1 |
python Match “python”. |
Sr.No. | Example & Description |
---|---|
1 |
[Pp]ython Match “Python” or “python” |
2 |
rub[ye] Match “ruby” or “rube” |
3 |
[aeiou] Match any one lowercase vowel |
4 |
[0-9] Match any digit; same as [0123456789] |
5 |
[a-z] Match any lowercase ASCII letter |
6 |
[A-Z] Match any uppercase ASCII letter |
7 |
[a-zA-Z0-9] Match any of the above |
8 |
[^aeiou] Match anything other than a lowercase vowel |
9 |
[^0-9] Match anything other than a digit |
Sr.No. | Example & Description |
---|---|
1 |
. Match any character except newline |
2 |
\d Match a digit: [0-9] |
3 |
\D Match a nondigit: [^0-9] |
4 |
\s Match a whitespace character: [ \t\r\n\f] |
5 |
\S Match nonwhitespace: [^ \t\r\n\f] |
6 |
\w Match a single word character: [A-Za-z0-9_] |
7 |
\W Match a nonword character: [^A-Za-z0-9_] |
Sr.No. | Example & Description |
---|---|
1 |
ruby? Match “rub” or “ruby”: the y is optional |
2 |
ruby* Match “rub” plus 0 or more ys |
3 |
ruby+ Match “rub” plus 1 or more ys |
4 |
\d{3} Match exactly 3 digits |
5 |
\d{3,} Match 3 or more digits |
6 |
\d{3,5} Match 3, 4, or 5 digits |
这匹配最小的重复次数-
Sr.No. | Example & Description |
---|---|
1 |
<.*> Greedy repetition: matches “ |
2 |
<.*?> Nongreedy: matches “ |
Sr.No. | Example & Description |
---|---|
1 |
\D\d+ No group: + repeats \d |
2 |
(\D\d)+ Grouped: + repeats \D\d pair |
3 |
([Pp]ython(,)?)+ Match “Python”, “Python, python, python”, etc. |
这再次匹配先前匹配的组-
Sr.No. | Example & Description |
---|---|
1 |
([Pp])ython&\1ails Match python&pails or Python&Pails |
2 |
([‘”])[^\1]*\1 Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc. |
Sr.No. | Example & Description |
---|---|
1 |
python|perl Match “python” or “perl” |
2 |
rub(y|le) Match “ruby” or “ruble” |
3 |
Python(!+|\?) “Python” followed by one or more ! or one ? |
这需要指定匹配位置。
Sr.No. | Example & Description |
---|---|
1 |
^Python Match “Python” at the start of a string or internal line |
2 |
Python$ Match “Python” at the end of a string or line |
3 |
\APython Match “Python” at the start of a string |
4 |
Python\Z Match “Python” at the end of a string |
5 |
\bPython\b Match “Python” at a word boundary |
6 |
\brub\B \B is nonword boundary: match “rub” in “rube” and “ruby” but not alone |
7 |
Python(?=!) Match “Python”, if followed by an exclamation point. |
8 |
Python(?!!) Match “Python”, if not followed by an exclamation point. |
Sr.No. | Example & Description |
---|---|
1 |
R(?#comment) Matches “R”. All the rest is a comment |
2 |
R(?i)uby Case-insensitive while matching “uby” |
3 |
R(?i:uby) Same as above |
4 |
rub(?:y|le)) Group only without creating \1 backreference |