Python正则表达式元字符
元字符被视为正则表达式的构建块。正则表达式是用于匹配字符串中字符组合的模式。元字符在查找模式方面具有特殊意义,主要用于定义搜索条件和任何文本操作。
一些最常用的元字符及其用途如下:
Meta Character | Description | Example |
---|---|---|
\d | whole numbers( 0-9 )(single digit) | \d = 7, \d\d=77 |
\w | alphanumeric character | \w\w\w\w = geek \w\w\w =! geek |
* | 0 or more characters | s* = _,s,ss,sss,ssss….. |
+ | 1 or more characters | s+ = s,ss,sss,ssss….. |
? | 0 or 1 character | s? = _ or s |
{m} | occurs “m” times | sd{3} = sddd |
{m,n} | min “m” and max “n” times | sd{2,3}=sdd or sddd |
\W | symbols | \W = % |
[a-z] or [0-9] | character set | geek[sy] = geeky geek[sy] != geeki |
正则表达式可以由元字符构建,模式可以使用Python中称为“re”的正则表达式库进行处理。
import re # used to import regular expressions
内置库可用于编译模式、查找模式、 等等。
示例:在下面的代码中,我们将根据给定的正则表达式生成所有模式
Python3
import re
'''
Meta characters -
* - 0 or more
+ - 1 or more
? - 0 or 1
{m} - m times
{m,n} - min m and max n
'''
test_phrase = 'sddsd..sssddd...sdddsddd...dsds...dsssss...sdddd'
test_patterns = [r'sd*', # s followed by zero or more d's
r'sd+', # s followed by one or more d's
r'sd?', # s followed by zero or one d's
r'sd{3}', # s followed by three d's
r'sd{2,3}', # s followed by two to three d's
]
def multi_re_find(test_patterns, test_phrase):
for pattern in test_patterns:
compiledPattern = re.compile(pattern)
print('finding {} in test_phrase'.format(pattern))
print(re.findall(compiledPattern, test_phrase))
multi_re_find(test_patterns, test_phrase)
输出:
finding sd* in test_phrase
[‘sdd’, ‘sd’, ‘s’, ‘s’, ‘sddd’, ‘sddd’, ‘sddd’, ‘sd’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘sdddd’]
finding sd+ in test_phrase
[‘sdd’, ‘sd’, ‘sddd’, ‘sddd’, ‘sddd’, ‘sd’, ‘sdddd’]
finding sd? in test_phrase
[‘sd’, ‘sd’, ‘s’, ‘s’, ‘sd’, ‘sd’, ‘sd’, ‘sd’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘s’, ‘sd’]
finding sd{3} in test_phrase
[‘sddd’, ‘sddd’, ‘sddd’, ‘sddd’]
finding sd{2,3} in test_phrase
[‘sdd’, ‘sddd’, ‘sddd’, ‘sddd’, ‘sddd’]