Python正则表达式备忘单
正则表达式或正则表达式是Python编程或任何其他编程语言的重要组成部分。它用于搜索甚至替换指定的文本模式。在正则表达式中,一组字符共同构成了搜索模式。它也被称为正则表达式模式。 Regex 的难点不是学习或理解它,而是记住语法以及如何根据我们的要求形成模式。所以这里我们提供了一个正则表达式备忘单,其中包含正则表达式中使用的所有不同的字符类、特殊字符、修饰符、集合等。
基本字符:
Expression | Explanations |
---|---|
^ | Matches the expression to its right, at the start of a string before it experiences a line break |
$ | Matches the expression to its left, at the end of a string before it experiences a line break |
. | Matches any character except newline |
a | Matches exactly one character a |
xy | Matches the string xy |
a|b | Matches expression a or b. If a is matched first, b is left untried. |
例子:
Python3
import re
print(re.search(r"^x","xenon"))
print(re.search(r"s$","geeks"))
Python3
import re
print(re.search(r"9+","289908"))
print(re.search(r"\d{3}","hello1234"))
Python3
import re
print(re.search(r"\s","xenon is a gas"))
print(re.search(r"\D+\d*","123geeks123"))
Python3
import re
print(re.search(r"[^abc]","abcde"))
print(re.search(r"[a-p]","xenon"))
Python3
import re
example = (re.search(r"(?:AB)","ACABC"))
print(example)
print(example.groups())
result = re.search(r"(\w*), (\w*)","geeks, best")
print(result.groups())
Python3
import re
print(re.search(r"z(?=a)", "pizza"))
print(re.search(r"z(?!a)", "pizza"))
Python3
import re
exp = """hello there
I am from
Geeks for Geeks"""
print(re.search(r"and", "Sun And Moon", flags=re.IGNORECASE))
print(re.findall(r"^\w", exp, flags = re.MULTILINE))
输出:
解释:
首先使用命令import re导入正则表达式模块
然后,在第一个示例中,我们使用正则表达式在单词“xenon”中搜索“ ^x” 。 ^这个字符匹配它右边的表达式,在字符串的开头。因此, ^x将在字符串的开头搜索字符x 。由于xenon以x 开头,它将找到匹配项并返回匹配项 ('x') 及其位置 (0,1)
类似地,在第二个例子中s$将搜索字符串末尾的字符s ,现在因为极客以s结尾,所以它将找到匹配并返回匹配('s')及其位置(4, 5)。
量词:
Expressions | Explanations |
---|---|
+ | Matches the expression to its left 1 or more times. |
* | Matches the expression to its left 0 or more times. |
? | Matches the expression to its left 0 or 1 times |
{p} | Matches the expression to its left p times, and not less. |
{p, q} | Matches the expression to its left p to q times, and not less. |
{p, } | Matches the expression to its left p or more times. |
{ , q} | Matches the expression to its left up to q times |
他们的默认搜索方法是贪婪。但是如果 ?添加到限定符(+、* 和 ? 本身)后,它将以非贪婪的方式执行匹配。
例子:
蟒蛇3
import re
print(re.search(r"9+","289908"))
print(re.search(r"\d{3}","hello1234"))
输出:
解释:
在第一个示例中, 9+将搜索数字9一次或多次。由于289908包含9两次,正则表达式将匹配它并打印 match('99') 及其位置(2,4)
在第二个示例中, \d{3}将精确搜索数字 3 次。由于hello1234有数字,它将恰好匹配第一个遇到的 3 个数字,即 123 而不是 4,因为{3}将正好匹配 3 个数字。所以它将打印匹配('123')及其位置(5,8)
字符类:
Expressions | Explanations |
---|---|
\w | Matches alphanumeric characters, that is a-z, A-Z, 0-9, and underscore(_) |
\W | Matches non-alphanumeric characters, that is except a-z, A-Z, 0-9 and _ |
\d | Matches digits, from 0-9. |
\D | Matches any non-digits. |
\s | Matches whitespace characters, which also include the \t, \n, \r, and space characters. |
\S | Matches non-whitespace characters. |
\A | Matches the expression to its right at the absolute start of a string whether in single or multi-line mode. |
\Z | Matches the expression to its left at the absolute end of a string whether in single or multi-line mode. |
\n | Matches a newline character |
\t | Matches tab character |
\b | Matches the word boundary (or empty string) at the start and end of a word. |
\B | Matches where \b does not, that is, non-word boundary |
例子:
蟒蛇3
import re
print(re.search(r"\s","xenon is a gas"))
print(re.search(r"\D+\d*","123geeks123"))
输出:
解释:
在第一个示例中, \s将搜索空格,每当遇到第一个空格时,它将打印出该匹配项。由于氙气是一种包含空格的气体,它会遇到第一个空格并打印出匹配(' ')及其位置(5,6)
在第二个示例中, \D+\d*将搜索一个或多个非数字字符,后跟 0 个或多个数字。在我们的例子中, geeks123最适合描述,因为它包含 1 个或多个非数字字符(geeks),后跟 0 个或多个数字字符(123)。所以它将打印匹配('geeks123')及其位置(3,11)。
套:
Expressions | Explanations |
---|---|
[abc] | Matches either a, b, or c. It does not match abc. |
[a-z] | Matches any alphabet from a to z. |
[A-Z] | Matches any alphabets in capital from A to Z |
[a\-p] | Matches a, -, or p. It matches – because \ escapes it. |
[-z] | Matches – or z |
[a-z0-9] | Matches characters from a to z or from 0 to 9. |
[(+*)] | Special characters become literal inside a set, so this matches (, +, *, or ) |
[^ab5] | Adding ^ excludes any character in the set. Here, it matches characters that are not a, b, or 5. |
\[a\] | Matches [a] because both parentheses [ ] are escaped |
例子:
蟒蛇3
import re
print(re.search(r"[^abc]","abcde"))
print(re.search(r"[a-p]","xenon"))
输出:
解释:
在第一个示例中, [^abc]将搜索除 a、b 和 c 之外的任何内容,因此正则表达式将匹配第一个不是 a 或 b 或 c 的字符,并打印出该匹配项。由于abcde包含d作为其第一个既不是 a 也不是 b 也不是 c 的匹配项,因此它将打印出该匹配项。所以匹配将是 ('d') 并且它的位置将是 (3,4)
在第二个示例中, [ap]将搜索 a 到 p 之间的字符。在氙气中 ap 之间的第一个单词是e它将打印出该搜索。所以匹配将是 ('e') 并且它的位置将是 (1,2)
团体:
Expressions | Explanations |
---|---|
( ) | Matches the expression inside the parentheses and groups it which we can capture as required |
(?#…) | Read a comment |
(?PAB) | Matches the expression AB, which can be retrieved with the group name. |
(?:A) | Matches the expression as represented by A, but cannot be retrieved afterwards. |
(?P=group) | Matches the expression matched by an earlier group named “group” |
例子:
蟒蛇3
import re
example = (re.search(r"(?:AB)","ACABC"))
print(example)
print(example.groups())
result = re.search(r"(\w*), (\w*)","geeks, best")
print(result.groups())
输出:
()
('geeks', 'best')
解释:
在第一个示例中, (?:AB)将搜索并匹配表达式AB并打印出匹配项及其位置。由于ACABC包含AB,它将打印 match('AB') 及其位置 (2,4),但如上所述,此后无法检索。因此,如果我们尝试打印输出的组,它将显示一个空括号。
在第二个示例中,我们捕获了两个组,一个组包含 0 个或多个字母数字字符,后跟逗号和空格,然后另一个组包含 0 个或多个字母数字字符。在极客中,最好的极客和最好的被捕获为第一组和第二组。因此,当我们打印出这些组时,我们将拥有 ('geeks', 'best) 作为捕获的组。
断言:
Expression | Explanation |
---|---|
A(?=B) | This matches the expression A only if it is followed by B. (Positive look ahead assertion) |
A(?!B) | This matches the expression A only if it is not followed by B. (Negative look ahead assertion) |
(?<=B)A | This matches the expression A only if B is immediate to its left. (Positive look behind assertion) |
(? | This matches the expression A only if B is not immediately to its left. (Negative look behind assertion) |
(?()|) | If else conditional |
例子:
蟒蛇3
import re
print(re.search(r"z(?=a)", "pizza"))
print(re.search(r"z(?!a)", "pizza"))
输出:
解释:
在第一个示例中, z(?=a)将搜索字符z 后跟字符a。因为在披萨中,我们有一个字符z紧随其后的是字符a (pizz za ), 所以会有一场比赛。正则表达式将打印 match('z') 后跟a及其位置 (3,4)
在第二个示例中, z(?!a)将搜索后面没有跟有字符a 的字符z 。因为在披萨中,我们有一个字符z后面不是a而是z (pi zz a),所以会有匹配。正则表达式将打印 match('z') 后跟a及其位置 (2,3)
标志:
Expression | Explanation |
---|---|
a | Matches ASCII only |
i | Ignore case |
L | Locale character classes |
m | ^ and $ match start and end of the line (Multi-line) |
s | Matches everything including newline as well |
u | Matches Unicode character classes |
x | Allow spaces and comments (Verbose) |
例子:
蟒蛇3
import re
exp = """hello there
I am from
Geeks for Geeks"""
print(re.search(r"and", "Sun And Moon", flags=re.IGNORECASE))
print(re.findall(r"^\w", exp, flags = re.MULTILINE))
输出:
['h', 'I', 'G']
解释:
在第一个示例中,IGNORECASE 标志将搜索单词并且不考虑其大小写(无论是大写还是小写),因此它忽略大小写并匹配表达式中的And 。所以它会打印 match('And') 和它的 position(4,7)
在第二个示例中, MULTILINE 标志将在每一行中搜索,并在该行以字母数字字符开头时匹配。由于在 Multi-line hello 中,我来自 Geeks for Geeks,每行都以字母数字字符开头,因此它将匹配每一行并在数组中打印匹配项 (['h', 'I', 'G' ])。
注意:在 MULTILINE 标志中,我们必须使用 re.findall,因为它有很多匹配项(对于每一行)