📜  Python正则表达式备忘单

📅  最后修改于: 2022-05-13 01:54:52.097000             🧑  作者: Mango

Python正则表达式备忘单

正则表达式或正则表达式是Python编程或任何其他编程语言的重要组成部分。它用于搜索甚至替换指定的文本模式。在正则表达式中,一组字符共同构成了搜索模式。它也被称为正则表达式模式。 Regex 的难点不是学习或理解它,而是记住语法以及如何根据我们的要求形成模式。所以这里我们提供了一个正则表达式备忘单,其中包含正则表达式中使用的所有不同的字符类、特殊字符、修饰符、集合等。

基本字符:

Expression 

Explanations

^

Matches the expression to its right, at the start of a string before it experiences a line break

$

Matches the expression to its left, at the end of a string before it experiences a line break

.

Matches any character except newline

a

Matches exactly one character a

xy

Matches the string xy

a|b

Matches expression a or b. If a is matched first, b is left untried.

例子:

Python3
import re
  
print(re.search(r"^x","xenon"))
print(re.search(r"s$","geeks"))


Python3
import re
  
print(re.search(r"9+","289908"))
print(re.search(r"\d{3}","hello1234"))


Python3
import re
  
print(re.search(r"\s","xenon is a gas"))
print(re.search(r"\D+\d*","123geeks123"))


Python3
import re
  
print(re.search(r"[^abc]","abcde"))
print(re.search(r"[a-p]","xenon"))


Python3
import re
  
example = (re.search(r"(?:AB)","ACABC"))
print(example)
print(example.groups())
  
result = re.search(r"(\w*), (\w*)","geeks, best")
print(result.groups())


Python3
import re
  
print(re.search(r"z(?=a)", "pizza"))
print(re.search(r"z(?!a)", "pizza"))


Python3
import re
  
exp = """hello there
I am from
Geeks for Geeks"""
  
print(re.search(r"and", "Sun And Moon", flags=re.IGNORECASE)) 
print(re.findall(r"^\w", exp, flags = re.MULTILINE))


输出:


解释:

首先使用命令import re导入正则表达式模块

然后,在第一个示例中,我们使用正则表达式在单词“xenon”中搜索“ ^x”^这个字符匹配它右边的表达式,在字符串的开头。因此, ^x将在字符串的开头搜索字符x 。由于xenonx 开头,它将找到匹配项并返回匹配项 ('x') 及其位置 (0,1)

类似地,在第二个例子中s$将搜索字符串末尾的字符s ,现在因为极客以s结尾,所以它将找到匹配并返回匹配('s')及其位置(4, 5)。

量词:

ExpressionsExplanations

+

Matches the expression to its left 1 or more times.

*

Matches the expression to its left 0 or more times.

?

Matches the expression to its left 0 or 1 times

{p}

Matches the expression to its left p times, and not less.

{p, q}

Matches the expression to its left p to q times, and not less.

{p, }

Matches the expression to its left p or more times.

{ , q}

Matches the expression to its left up to q times

他们的默认搜索方法是贪婪。但是如果 ?添加到限定符(+、* 和 ? 本身)后,它将以非贪婪的方式执行匹配。

例子:

蟒蛇3

import re
  
print(re.search(r"9+","289908"))
print(re.search(r"\d{3}","hello1234"))

输出:


解释:

在第一个示例中, 9+将搜索数字9一次或多次。由于289908包含9两次,正则表达式将匹配它并打印 match('99') 及其位置(2,4)

在第二个示例中, \d{3}将精确搜索数字 3 次。由于hello1234有数字,它将恰好匹配第一个遇到的 3 个数字,即 123 而不是 4,因为{3}将正好匹配 3 个数字。所以它将打印匹配('123')及其位置(5,8)

字符类:

ExpressionsExplanations

\w 

Matches alphanumeric characters, that is a-z, A-Z, 0-9, and underscore(_)

\W

Matches non-alphanumeric characters, that is except a-z, A-Z, 0-9 and _

\d

Matches digits, from 0-9.

\D 

Matches any non-digits.

\s

Matches whitespace characters, which also include the \t, \n, \r, and space characters.

\S

Matches non-whitespace characters.

\A

Matches the expression to its right at the absolute start of a string whether in single or multi-line mode.

\Z 

Matches the expression to its left at the absolute end of a string whether in single or multi-line mode.

\n

Matches a newline character

\t

Matches tab character

\b

Matches the word boundary (or empty string) at the start and end of a word.

\B

Matches where \b does not, that is, non-word boundary

例子:

蟒蛇3

import re
  
print(re.search(r"\s","xenon is a gas"))
print(re.search(r"\D+\d*","123geeks123"))

输出:


解释:

在第一个示例中, \s将搜索空格,每当遇到第一个空格时,它将打印出该匹配项。由于氙气是一种包含空格的气体,它会遇到第一个空格并打印出匹配(' ')及其位置(5,6)

在第二个示例中, \D+\d*将搜索一个或多个非数字字符,后跟 0 个或多个数字。在我们的例子中, geeks123最适合描述,因为它包含 1 个或多个非数字字符(geeks),后跟 0 个或多个数字字符(123)。所以它将打印匹配('geeks123')及其位置(3,11)。

套:

ExpressionsExplanations

[abc]

 Matches either a, b, or c. It does not match abc.

[a-z]

Matches any alphabet from a to z.

[A-Z]

Matches any alphabets in capital from A to Z

[a\-p]

Matches a, -, or p. It matches – because \ escapes it.

[-z]

Matches – or z

[a-z0-9]

Matches characters from a to z or from 0 to 9.

[(+*)] 

Special characters become literal inside a set, so this matches (, +, *, or )

[^ab5] 

Adding ^ excludes any character in the set. Here, it matches characters that are not a, b, or 5.

\[a\]

Matches [a] because both parentheses [ ] are escaped 

例子:

蟒蛇3

import re
  
print(re.search(r"[^abc]","abcde"))
print(re.search(r"[a-p]","xenon"))

输出:


解释:

在第一个示例中, [^abc]将搜索除 a、b 和 c 之外的任何内容,因此正则表达式将匹配第一个不是 a 或 b 或 c 的字符,并打印出该匹配项。由于abcde包含d作为其第一个既不是 a 也不是 b 也不是 c 的匹配项,因此它将打印出该匹配项。所以匹配将是 ('d') 并且它的位置将是 (3,4)

在第二个示例中, [ap]将搜索 a 到 p 之间的字符。在氙气中 ap 之间的第一个单词是e它将打印出该搜索。所以匹配将是 ('e') 并且它的位置将是 (1,2)

团体:

ExpressionsExplanations

( )

Matches the expression inside the parentheses and groups it which we can capture as required

(?#…)

Read a comment

(?PAB)

Matches the expression AB, which can be retrieved with the group name.

(?:A)

Matches the expression as represented by A, but cannot be retrieved afterwards.

(?P=group)

 Matches the expression matched by an earlier group named “group”

例子:

蟒蛇3

import re
  
example = (re.search(r"(?:AB)","ACABC"))
print(example)
print(example.groups())
  
result = re.search(r"(\w*), (\w*)","geeks, best")
print(result.groups())

输出:


()
('geeks', 'best')

解释:

在第一个示例中, (?:AB)将搜索并匹配表达式AB并打印出匹配项及其位置。由于ACABC包含AB,它将打印 match('AB') 及其位置 (2,4),但如上所述,此后无法检索。因此,如果我们尝试打印输出的组,它将显示一个空括号。

在第二个示例中,我们捕获了两个组,一个组包含 0 个或多个字母数字字符,后跟逗号和空格,然后另一个组包含 0 个或多个字母数字字符。在极客中,最好的极客最好的被捕获为第一组和第二组。因此,当我们打印出这些组时,我们将拥有 ('geeks', 'best) 作为捕获的组。

断言:

ExpressionExplanation

A(?=B)

This matches the expression A only if it is followed by B. (Positive look ahead assertion)

A(?!B)

This matches the expression A only if it is not followed by B. (Negative look ahead assertion)

(?<=B)A

This matches the expression A only if B is immediate to its left.  (Positive look behind assertion)

(?

This matches the expression A only if B is not immediately to its left. (Negative look behind assertion)

(?()|)

If else conditional

例子:

蟒蛇3

import re
  
print(re.search(r"z(?=a)", "pizza"))
print(re.search(r"z(?!a)", "pizza"))

输出:


解释:

在第一个示例中, z(?=a)将搜索字符z 后跟字符a。因为在披萨中,我们有一个字符z紧随其后的是字符a (pizz za ), 所以会有一场比赛。正则表达式将打印 match('z') 后跟a及其位置 (3,4)

在第二个示例中, z(?!a)将搜索后面没有跟有字符a 的字符z 因为在披萨中,我们有一个字符z后面不是a而是z (pi zz a),所以会有匹配。正则表达式将打印 match('z') 后跟a及其位置 (2,3)

标志:

ExpressionExplanation

a

Matches ASCII only

Ignore case

L

Locale character classes

m

^ and $ match start and end of the line (Multi-line)

s

Matches everything including newline as well

u

 Matches Unicode character classes

x

Allow spaces and comments (Verbose)

例子:

蟒蛇3

import re
  
exp = """hello there
I am from
Geeks for Geeks"""
  
print(re.search(r"and", "Sun And Moon", flags=re.IGNORECASE)) 
print(re.findall(r"^\w", exp, flags = re.MULTILINE))

输出:


['h', 'I', 'G']

解释:

在第一个示例中,IGNORECASE 标志将搜索单词并且不考虑其大小写(无论是大写还是小写),因此它忽略大小写并匹配表达式中的And 。所以它会打印 match('And') 和它的 position(4,7)

在第二个示例中, MULTILINE 标志将在每一行中搜索,并在该行以字母数字字符开头时匹配。由于在 Multi-line hello 中,我来自 Geeks for Geeks,每行都以字母数字字符开头,因此它将匹配每一行并在数组中打印匹配项 (['h', 'I', 'G' ])。

注意:在 MULTILINE 标志中,我们必须使用 re.findall,因为它有很多匹配项(对于每一行)