📜  Python 3-正则表达式

📅  最后修改于: 2020-12-23 04:59:44             🧑  作者: Mango


正则表达式是字符的特殊序列,可帮助您匹配或查找其他字符串或字符串集,使用的模式举办了专门的语法。正则表达式在UNIX世界中被广泛使用。

re模块提供对Python类似Perl的正则表达式的完全支持。如果在编译或使用正则表达式时发生错误,则re模块会引发异常re.error

我们将介绍两个重要的函数,这些函数将用于处理正则表达式。不过,首先要注意的是:有多种字符,在正则表达式中使用它们会具有特殊含义。为了避免在处理正则表达式时造成混淆,我们将原始字符串用作r’expression’

匹配单个字符的基本模式

Sr.No. Expression & Matches
1

a, X, 9, <

ordinary characters just match themselves exactly.

2

. (a period)

matches any single character except newline ‘\n’

3

\w

matches a “word” character: a letter or digit or underbar [a-zA-Z0-9_].

4

\W

matches any non-word character.

5

\b

boundary between word and non-word

6

\s

matches a single whitespace character — space, newline, return, tab

7

\S

matches any non-whitespace character.

8

\t, \n, \r

tab, newline, return

9

\d

decimal digit [0-9]

10

^

matches start of the string

11

$

match the end of the string

12

\

inhibit the “specialness” of a character.

编译标志

编译标志使您可以修改正则表达式工作方式的某些方面。 re模块中的标志有两个名称,一个长名称(如IGNORECASE)和一个短的单字母形式(如I)。

Sr.No. Flag & Meaning
1

ASCII, A

Makes several escapes like \w, \b, \s and \d match only on ASCII characters with the respective property.

2

DOTALL, S

Make, match any character, including newlines

3

IGNORECASE, I

Do case-insensitive matches

4

LOCALE, L

Do a locale-aware match

5

MULTILINE, M

Multi-line matching, affecting ^ and $

6

VERBOSE, X (for ‘extended’)

Enable verbose REs, which can be organized more cleanly and understandably

比赛功能

该函数尝试将RE模式与带有可选标志的字符串匹配。

这是此函数的语法-

re.match(pattern, string, flags = 0)

这是参数的描述-

Sr.No. Parameter & Description
1

pattern

This is the regular expression to be matched.

2

string

This is the string, which would be searched to match the pattern at the beginning of string.

3

flags

You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

re.match函数成功返回匹配对象,失败则返回None 。我们使用match对象的group(num)groups()函数来获取匹配的表达式。

Sr.No. Match Object Method & Description
1

group(num = 0)

This method returns entire match (or specific subgroup num)

2

groups()

This method returns all matching subgroups in a tuple (empty if there weren’t any)

#!/usr/bin/python3
import re

line = "Cats are smarter than dogs"

matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print ("matchObj.group() : ", matchObj.group())
   print ("matchObj.group(1) : ", matchObj.group(1))
   print ("matchObj.group(2) : ", matchObj.group(2))
else:
   print ("No match!!")

执行以上代码后,将产生以下结果-

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

搜索功能

此函数使用可选标志搜索字符串中RE模式的首次出现。

这是此函数的语法-

re.search(pattern, string, flags = 0)

这是参数的描述-

Sr.No. Parameter & Description
1

pattern

This is the regular expression to be matched.

2

string

This is the string, which would be searched to match the pattern anywhere in the string.

3

flags

You can specify different flags using bitwise OR (|). These are modifiers, which are listed in the table below.

re.search函数返回成功匹配的对象,没有失败。我们使用match对象的group(num)groups()函数来获取匹配的表达式。

Sr.No. Match Object Method & Description
1

group(num = 0)

This method returns entire match (or specific subgroup num)

2

groups()

This method returns all matching subgroups in a tuple (empty if there weren’t any)

#!/usr/bin/python3
import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print ("searchObj.group() : ", searchObj.group())
   print ("searchObj.group(1) : ", searchObj.group(1))
   print ("searchObj.group(2) : ", searchObj.group(2))
else:
   print ("Nothing found!!")

执行以上代码后,将产生以下结果-

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

匹配与搜索

Python基于正则表达式提供了两种不同的基本操作: match仅在字符串的开头检查匹配,而search在字符串中的任何位置检查匹配(这是Perl的默认设置)。

#!/usr/bin/python3
import re

line = "Cats are smarter than dogs";

matchObj = re.match( r'dogs', line, re.M|re.I)
if matchObj:
   print ("match --> matchObj.group() : ", matchObj.group())
else:
   print ("No match!!")

searchObj = re.search( r'dogs', line, re.M|re.I)
if searchObj:
   print ("search --> searchObj.group() : ", searchObj.group())
else:
   print ("Nothing found!!")

执行以上代码后,将产生以下结果-

No match!!
search --> matchObj.group() :  dogs

搜索和替换

sub是使用正则表达式的最重要的re方法之一。

句法

re.sub(pattern, repl, string, max=0)

此方法用repl替换字符串所有出现的RE模式,除非提供了max ,否则将替换所有出现的RE模式。此方法返回修改后的字符串。

#!/usr/bin/python3
import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print ("Phone Num : ", num)

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print ("Phone Num : ", num)

执行以上代码后,将产生以下结果-

Phone Num :  2004-959-559
Phone Num :  2004959559

正则表达式修饰符:选项标志

正则表达式字面量可以包括可选的修饰符,以控制匹配的各个方面。修饰符被指定为可选标志。您可以使用异或(|)提供多个修饰符,如前所示,并且可以用以下任意一种表示-

Sr.No. Modifier & Description
1

re.I

Performs case-insensitive matching.

2

re.L

Interprets words according to the current locale. This interpretation affects the alphabetic group (\w and \W), as well as word boundary behavior (\b and \B).

3

re.M

Makes $ match the end of a line (not just the end of the string) and makes ^ match the start of any line (not just the start of the string).

4

re.S

Makes a period (dot) match any character, including a newline.

5

re.U

Interprets letters according to the Unicode character set. This flag affects the behavior of \w, \W, \b, \B.

6

re.X

Permits “cuter” regular expression syntax. It ignores whitespace (except inside a set [] or when escaped by a backslash) and treats unescaped # as a comment marker.

正则表达式模式

除了控制字符(&plus;?。* ^&dollar;()[] {} | \) ,其他所有字符匹配。您可以在控制字符前面加上反斜杠来对其进行转义。

下表列出了Python可用的正则表达式语法-

正则表达式示例

字面量字符

Sr.No. Example & Description
1

python

Match “python”.

字符类

Sr.No. Example & Description
1

[Pp]ython

Match “Python” or “python”

2

rub[ye]

Match “ruby” or “rube”

3

[aeiou]

Match any one lowercase vowel

4

[0-9]

Match any digit; same as [0123456789]

5

[a-z]

Match any lowercase ASCII letter

6

[A-Z]

Match any uppercase ASCII letter

7

[a-zA-Z0-9]

Match any of the above

8

[^aeiou]

Match anything other than a lowercase vowel

9

[^0-9]

Match anything other than a digit

特殊字符类

Sr.No. Example & Description
1

.

Match any character except newline

2

\d

Match a digit: [0-9]

3

\D

Match a nondigit: [^0-9]

4

\s

Match a whitespace character: [ \t\r\n\f]

5

\S

Match nonwhitespace: [^ \t\r\n\f]

6

\w

Match a single word character: [A-Za-z0-9_]

7

\W

Match a nonword character: [^A-Za-z0-9_]

重复案例

Sr.No. Example & Description
1

ruby?

Match “rub” or “ruby”: the y is optional

2

ruby*

Match “rub” plus 0 or more ys

3

ruby+

Match “rub” plus 1 or more ys

4

\d{3}

Match exactly 3 digits

5

\d{3,}

Match 3 or more digits

6

\d{3,5}

Match 3, 4, or 5 digits

非贪婪重复

这匹配最小的重复次数-

Sr.No. Example & Description
1

<.*>

Greedy repetition: matches “perl>”

2

<.*?>

Nongreedy: matches “” in “perl>”

用括号分组

Sr.No. Example & Description
1

\D\d+

No group: + repeats \d

2

(\D\d)+

Grouped: + repeats \D\d pair

3

([Pp]ython(,)?)+

Match “Python”, “Python, python, python”, etc.

反向引用

这再次匹配先前匹配的组-

Sr.No. Example & Description
1

([Pp])ython&\1ails

Match python&pails or Python&Pails

2

([‘”])[^\1]*\1

Single or double-quoted string. \1 matches whatever the 1st group matched. \2 matches whatever the 2nd group matched, etc.

备择方案

Sr.No. Example & Description
1

python|perl

Match “python” or “perl”

2

rub(y|le)

Match “ruby” or “ruble”

3

Python(!+|\?)

“Python” followed by one or more ! or one ?

锚点

这需要指定匹配位置。

Sr.No. Example & Description
1

^Python

Match “Python” at the start of a string or internal line

2

Python$

Match “Python” at the end of a string or line

3

\APython

Match “Python” at the start of a string

4

Python\Z

Match “Python” at the end of a string

5

\bPython\b

Match “Python” at a word boundary

6

\brub\B

\B is nonword boundary: match “rub” in “rube” and “ruby” but not alone

7

Python(?=!)

Match “Python”, if followed by an exclamation point.

8

Python(?!!)

Match “Python”, if not followed by an exclamation point.

带括号的特殊语法

Sr.No. Example & Description
1

R(?#comment)

Matches “R”. All the rest is a comment

2

R(?i)uby

Case-insensitive while matching “uby”

3

R(?i:uby)

Same as above

4

rub(?:y|le))

Group only without creating \1 backreference