📜  Python正则表达式

📅  最后修改于: 2020-12-23 05:23:21             🧑  作者: Mango


正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。正则表达式在UNIX世界中被广泛使用。

Python的模块重新提供全面支持类似Perl在Python正则表达式。如果在编译或使用正则表达式时发生错误,则re模块会引发异常re.error。

我们将介绍两个重要的函数,这些函数将用于处理正则表达式。但首先要注意的是:有多种字符,在正则表达式中使用它们会具有特殊含义。为了避免在处理正则表达式时造成混淆,我们将原始字符串用作r’expression’

比赛功能

该函数尝试将RE模式与带有可选标志的字符串匹配。

这是此函数的语法-

re.match(pattern, string, flags=0)

这是参数的描述-

Sr.No. Parameter & Description
1

pattern

This is the regular expression to be matched.

2

string

This is the string, which would be searched to match the pattern at the beginning of string.

3

flags

You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

re.match函数成功返回匹配对象,失败则返回None 。我们使用match对象的group(num)groups()函数来获取匹配的表达式。

Sr.No. Match Object Method & Description
1

group(num=0)

This method returns entire match (or specific subgroup num)

2

groups()

This method returns all matching subgroups in a tuple (empty if there weren’t any)

#!/usr/bin/python
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

执行以上代码后,将产生以下结果-

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

搜索功能

此函数使用可选标志搜索字符串中RE模式的首次出现。

这是此函数的语法-

re.search(pattern, string, flags=0)

这是参数的描述-

Sr.No. Parameter & Description
1

pattern

This is the regular expression to be matched.

2

string

This is the string, which would be searched to match the pattern anywhere in the string.

3

flags

You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

re.search函数返回成功匹配的对象,没有失败。我们使用match对象的group(num)groups()函数来获取匹配的表达式。

Sr.No. Match Object Methods & Description
1

group(num=0)

This method returns entire match (or specific subgroup num)

2

groups()

This method returns all matching subgroups in a tuple (empty if there weren’t any)

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print "searchObj.group() : ", searchObj.group()
   print "searchObj.group(1) : ", searchObj.group(1)
   print "searchObj.group(2) : ", searchObj.group(2)
else:
   print "Nothing found!!"

执行以上代码后,将产生以下结果-

searchObj.group() :  Cats are smarter than dogs
searchObj.group(1) :  Cats
searchObj.group(2) :  smarter

匹配与搜索

Python基于正则表达式提供了两种不同的基本操作: match仅在字符串的开头检查匹配,而search在字符串中的任何位置检查匹配(这是Perl的默认设置)。

#!/usr/bin/python
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print "match --> matchObj.group() : ", matchObj.group()
else:
   print "No match!!"

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print "search --> searchObj.group() : ", searchObj.group()
else:
   print "Nothing found!!"

执行以上代码后,将产生以下结果-

No match!!
search --> searchObj.group() :  dogs

搜索和替换

sub是使用正则表达式的最重要的re方法之一。

句法

re.sub(pattern, repl, string, max=0)

此方法用repl替换字符串所有出现的RE模式,除非提供了max ,否则将替换所有出现的RE模式。此方法返回修改后的字符串。

#!/usr/bin/python
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

执行以上代码后,将产生以下结果-

Phone Num :  2004-959-559
Phone Num :  2004959559

正则表达式修饰符:选项标志

正则表达式字面量可以包括可选的修饰符,以控制匹配的各个方面。修饰符被指定为可选标志。您可以使用异或(|)提供多个修饰符,如前所示,并且可以用以下任意一种表示-

Sr.No. Modifier & Description
1

re.I

Performs case-insensitive matching.

2

re.L

Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior(\b and \B).

3

re.M

Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).

4

re.S

Makes a period (dot) match any character, including a newline.

5

re.U

Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.

6

re.X

Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.

正则表达式模式

除控制字符(+?。* ^ $()[] {} | \)外,所有字符匹配。您可以在控制字符前面加上反斜杠来对其进行转义。

下表列出了Python可用的正则表达式语法-

Sr.No. Pattern & Description
1

^

Matches beginning of line.

2

$

Matches end of line.

3

.

Matches any single character except newline. Using m option allows it to match newline as well.

4

[…]

Matches any single character in brackets.

5

[^…]

Matches any single character not in brackets

6

re*

Matches 0 or more occurrences of preceding expression.

7

re+

Matches 1 or more occurrence of preceding expression.

8

re?

Matches 0 or 1 occurrence of preceding expression.

9

re{ n}

Matches exactly n number of occurrences of preceding expression.

10

re{ n,}

Matches n or more occurrences of preceding expression.

11

re{ n, m}

Matches at least n and at most m occurrences of preceding expression.

12

a| b

Matches either a or b.

13

(re)

Groups regular expressions and remembers matched text.

14

(?imx)

Temporarily toggles on i, m, or x options within a regular expression. If in parentheses, only that area is affected.

15

(?-imx)

Temporarily toggles off i, m, or x options within a regular expression. If in parentheses, only that area is affected.

16

(?: re)

Groups regular expressions without remembering matched text.

17

(?imx: re)

Temporarily toggles on i, m, or x options within parentheses.

18

(?-imx: re)

Temporarily toggles off i, m, or x options within parentheses.

19

(?#…)

Comment.

20

(?= re)

Specifies position using a pattern. Doesn’t have a range.

21

(?! re)

Specifies position using pattern negation. Doesn’t have a range.

22

(?> re)

Matches independent pattern without backtracking.

23

\w

Matches word characters.

24

\W

Matches nonword characters.

25

\s

Matches whitespace. Equivalent to [\t\n\r\f].

26

\S

Matches nonwhitespace.

27

\d

Matches digits. Equivalent to [0-9].

28

\D

Matches nondigits.

29

\A

Matches beginning of string.

30

\Z

Matches end of string. If a newline exists, it matches just before newline.

31

\z

Matches end of string.

32

\G

Matches point where last match finished.

33

\b

Matches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.

34

\B

Matches nonword boundaries.

35

\n, \t, etc.

Matches newlines, carriage returns, tabs, etc.

36

\1…\9

Matches nth grouped subexpression.

37

\10

Matches nth grouped subexpression if it matched already. Otherwise refers to the octal representation of a character code.

正则表达式示例

字面量字符

Sr.No. Example & Description
1

python

Match “python”.

字符类

Sr.No. Example & Description
1

[Pp]ython

Match “Python” or “python”

2

rub[ye]

Match “ruby” or “rube”

3

[aeiou]

Match any one lowercase vowel

4

[0-9]

Match any digit; same as [0123456789]

5

[a-z]

Match any lowercase ASCII letter

6

[A-Z]

Match any uppercase ASCII letter

7

[a-zA-Z0-9]

Match any of the above

8

[^aeiou]

Match anything other than a lowercase vowel

9

[^0-9]

Match anything other than a digit

特殊字符类

Sr.No. Example & Description
1

.

Match any character except newline

2

\d

Match a digit: [0-9]

3

\D

Match a nondigit: [^0-9]

4

\s

Match a whitespace character: [ \t\r\n\f]

5

\S

Match nonwhitespace: [^ \t\r\n\f]

6

\w

Match a single word character: [A-Za-z0-9_]

7

\W

Match a nonword character: [^A-Za-z0-9_]

重复案例

Sr.No. Example & Description
1

ruby?

Match “rub” or “ruby”: the y is optional

2

ruby*

Match “rub” plus 0 or more ys

3

ruby+

Match “rub” plus 1 or more ys

4

\d{3}

Match exactly 3 digits

5

\d{3,}

Match 3 or more digits

6

\d{3,5}

Match 3, 4, or 5 digits

非贪婪重复

这匹配最小的重复次数-

Sr.No. Example & Description
1

<.*>

Greedy repetition: matches “perl>”

2

<.*?>

Nongreedy: matches “” in “perl>”

用括号分组

Sr.No. Example & Description
1

\D\d+

No group: + repeats \d

2

(\D\d)+

Grouped: + repeats \D\d pair

3

([Pp]ython(, )?)+

Match “Python”, “Python, python, python”, etc.

反向引用

这再次匹配先前匹配的组-

Sr.No. Example & Description
1

([Pp])ython&\1ails

Match python&pails or Python&Pails

2

([‘”])[^\1]*\1

Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc.

备择方案

Sr.No. Example & Description
1

python|perl

Match “python” or “perl”

2

rub(y|le))

Match “ruby” or “ruble”

3

Python(!+|\?)

“Python” followed by one or more ! or one ?

锚点

这需要指定匹配位置。

Sr.No. Example & Description
1

^Python

Match “Python” at the start of a string or internal line

2

Python$

Match “Python” at the end of a string or line

3

\APython

Match “Python” at the start of a string

4

Python\Z

Match “Python” at the end of a string

5

\bPython\b

Match “Python” at a word boundary

6

\brub\B

\B is nonword boundary: match “rub” in “rube” and “ruby” but not alone

7

Python(?=!)

Match “Python”, if followed by an exclamation point.

8

Python(?!!)

Match “Python”, if not followed by an exclamation point.

带括号的特殊语法

Sr.No. Example & Description
1

R(?#comment)

Matches “R”. All the rest is a comment

2

R(?i)uby

Case-insensitive while matching “uby”

3

R(?i:uby)

Same as above

4

rub(?:y|le))

Group only without creating \1 backreference