Python中的正则表达式和示例 1

📌 相关文章

📜 Python中的正则表达式和示例 1

📅 最后修改于: 2020-04-15 07:09:42 🧑 作者: Mango

模块正则表达式(RE)指定与其匹配的一组字符串(模式)。
为了理解RE的类比，MetaCharacter是有用的，重要的，并且将在模块re的功能中使用。
一共有14个元字符，将在功能中进行讨论：

[]  代表字符类
^   匹配开始
$   匹配结束
.   匹配除换行符以外的任何字符
?   匹配零个或一个匹配项。
|   表示或
*   任意次数(包括0次)
+   一处或多处
{}  指示先前RE的出现次数.
()  封装一组RE

函数compile()将正则表达式编译成模式对象，该对象具有用于各种操作的方法，例如搜索模式匹配或执行字符串替换。

# 使用__import __()导入模块正则表达式。
import re
# compile()创建正则表达式字符类[a-e]，等效于[abcde].
# class [abcde] will match with string with 'a', 'b', 'c', 'd', 'e'.
p = re.compile('[a-e]')
# findall()搜索正则表达式，并在找到后返回一个列表
print(p.findall("Aye, said Mr. Gibenson Stark"))

输出：

['e'，'a'，'d'，'b'，'e'，'a']

了解输出：
第一次出现是“ Aye”中的“ e”，而不是“ A”，因为它区分大小写。
下一个出现是“ said”中的“ a”，然后是“ said”中的“ d”，然后是“ Gibenson”中的“ b”和“ e”，最后一个“ a”与“ Stark”匹配。

元字符黑斜线“ \”具有非常重要的作用，因为它发出各种序列的信号。如果要使用没有特殊含义的反斜杠作为元字符，请使用“ \\”

\d   匹配任意十进制数字
     到设定的类别[0-9].
\D   匹配任何非数字字符.
\s   匹配任何空白字符.
\S   匹配任何非空白字符
\w   匹配任何字母数字字符，这
     等效于类[a-zA-Z0-9_].
\W   匹配任何非字母数字字符.

设置类[\s ,.], 将与任何空格字符.

import re
# \d等效于[0-9]。
p = re.compile('\d')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))
# \d +将匹配[0-9]上的一个或多个或更大的一组
p = re.compile('\d+')
print(p.findall("I went to him at 11 A.M. on 4th July 1886"))

输出：

['1'，'1'，'4'，'1'，'8'，'8'，'6']
['11'，'4'，'1886']

import re
# \w等效于[a-zA-Z0-9_]。
p = re.compile('\w')
print(p.findall("He said * in some_lang."))
# \w +匹配字母数字字符组.
p = re.compile('\w+')
print(p.findall("I went to him at 11 A.M., he said *** in some_language."))
# \W匹配非字母数字字符。
p = re.compile('\W')
print(p.findall("he said *** in some_language."))

输出：

['H', 'e', 's', 'a', 'i', 'd', 'i', 'n', 's', 'o', 'm', 'e', '_', 'l', 'a', 'n', 'g']
['I', 'went', 'to', 'him', 'at', '11', 'A', 'M', 'he', 'said', 'in', 'some_language']
[' ', ' ', '*', '*', '*', ' ', ' ', '.']

import re
p = re.compile('ab*')
print(p.findall("ababbaabbb"))

输出：

['ab'，'abb'，'a'，'abbb']

了解输出结果：
我们的RE为ab*,
输出’ab’是有效的，因为单一的’b’伴随着一个’a’。
输出“ abb”有效，因为伴随着两个“ b”的一个“ a”。
输出“ a”有效，因为单数为“ a”并伴有0个“ b”。
输出“ abbb”有效，因为单数为“ a”并伴有3个“ b”。

函数split()
通过出现字符或模式来分割字符串，找到该模式后，字符串中的其余字符将作为结果列表的一部分返回。
语法：

 re.split(pattern, string, maxsplit=0, flags=0)

第一个参数pattern表示正则表达式，string是将在其中搜索pattern并进行拆分的给定字符串，如果未提供maxsplit，则将其视为零“ 0″，如果提供任何非零值，则最多会发生许多分裂。如果maxsplit = 1，则该字符串将仅拆分一次，从而产生一个长度为2的列表。这些标志非常有用，可以帮助缩短代码，它们不是必需的参数，例如：flags = re.IGNORECASE，在此拆分中，大小写将被忽略。

from re import split
# '\W +'表示非字母数字字符或字符组
# 找到',' 或空格''后，split()将从该点开始拆分字符串
print(split('\W+', 'Words, words , Words'))
print(split('\W+', "Word's words Words"))
# 这里的'：'，'''，'不是AlphaNumeric，因此发生分裂的点
print(split('\W+', 'On 12th Jan 2016, at 11:02 AM'))
# '\d +'表示数字字符或字符组
# Splitting occurs at '12', '2016', '11', '02' only
print(split('\d+', 'On 12th Jan 2016, at 11:02 AM'))

输出：

['Words', 'words', 'Words']
['Word', 's', 'words', 'Words']
['On', '12th', 'Jan', '2016', 'at', '11', '02', 'AM']
['On ', 'th Jan ', ', at ', ':', ' AM']

例子：

import re
# 拆分仅发生一次，在“12"处，返回列表的长度为2
print(re.split('\d+', 'On 12th Jan 2016, at 11:02 AM', 1))
# 当flags = re.IGNORECASE时，将'Boy'和'boy'视为相同
print(re.split('[a-f]+', 'Aey, Boy oh boy, come here', flags = re.IGNORECASE))
print(re.split('[a-f]+', 'Aey, Boy oh boy, come here'))

输出：

['On'，'thth 2016，at 11:02 AM']
[''，'y，'，'oy oh'，'oy，'，'om'，'h'，'r'，'']
['A'，'y，Boy oh'，'oy，'，'om'，'h'，'r'，'']

函数sub()
语法：

 re.sub(pattern, repl, string, count=0, flags=0)

函数中的“ sub”代表SubString，在给定的字符串(第3个参数)中搜索某个正则表达式模式，并在找到子字符串模式后将其替换为repl(第2个参数)，计数检查并保持次数发生这种情况。

import re
# 正则表达式模式“ ub"与“Subject"和“Uber"处的字符串匹配。
# 由于CASE已被忽略，因此使用Flag时，“ ub"应与字符串匹配两次
# 匹配后，在“Subject"中将“ ub"替换为“〜*"，在“ Uber"中将替换“ Ub".
print(re.sub('ub', '~*' , 'Subject has Uber booked already', flags = re.IGNORECASE))
# 考虑到“ Uber"中的区分大小写，“ Ub"将不再生效
print(re.sub('ub', '~*' , 'Subject has Uber booked already'))
# 由于给定的计数值为1，因此更换的最大次数为1
print(re.sub('ub', '~*' , 'Subject has Uber booked already', count=1, flags = re.IGNORECASE))
# 模式前面的'r'表示RE，\s表示字符串的开始和结尾。
print(re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE))

输出：

S~*ject has ~*er booked already
S~*ject has Uber booked already
S~*ject has Uber booked already
Baked Beans & Spam

函数subn()
语法：

 re.subn(pattern, repl, string, count=0, flags=0)

subn()在所有方面都类似于sub()，但其提供输出的方式不同。它返回一个元组，其中包含替换和新字符串的总数，而不仅仅是字符串。

import re
print(re.subn('ub', '~*' , 'Subject has Uber booked already'))
t = re.subn('ub', '~*' , 'Subject has Uber booked already', flags = re.IGNORECASE)
print(t)
print(len(t))
# 这将产生与sub()相同的输出
print(t[0])

输出：

('S~*ject has Uber booked already', 1)
('S~*ject has ~*er booked already', 2)
Length of Tuple is:  2
S~*ject has ~*er booked already

函数escape()
语法：

re.escape(string)

返回所有非字母数字加反斜杠的字符串，如果您要匹配其中可能包含正则表达式元字符的任意文字字符串，则此方法很有用

import re 
# escape()在每个非字母数字字符之前返回带有反斜杠“\"的字符串 
# 在第一种情况下，仅''不是字母数字 
# 在第二种情况下，''，脱字符'^'，'-'，'[]'，'\'不是字母数字 
print(re.escape("This is Awseome even 1 AM")) 
print(re.escape("I Asked what is this [a-9], he said \t ^WoW"))

输出：

This\ is\ Awseome\ even\ 1\ AM 
I\ Asked\ what\ is\ this\ \[a\-9\]\,\ he\ said\ \ \ \^WoW/* Your code... */