PHP |常用表达
正则表达式通常称为正则表达式 (regexes),是一系列字符,以文本字符串的形式描述特殊搜索模式。它们基本上用于编程世界算法中,用于匹配一些松散定义的模式以实现一些相关任务。有时,正则表达式被理解为一种带有模式符号的迷你编程语言,允许用户解析文本字符串。字符的确切序列事先是不可预测的,因此正则表达式有助于根据模式定义获取所需的字符串。
正则表达式是一种描述匹配特定数量文本的字符串模式的紧凑方式。如您所知, PHP是一种常用于网站创建的开源语言,它提供正则表达式功能作为重要工具。像PHP一样,许多其他编程语言都有自己的正则表达式实现。其他应用程序也是如此,它们自己支持具有各种语法的正则表达式。许多可用的现代语言和工具将正则表达式应用于非常大的文件和字符串。让我们看看正则表达式在我们的应用程序中的一些优点和用途。
正则表达式的优点和用途:
在许多情况下,每当在自由文本字段中收集数据时,开发人员都会遇到问题,因为大多数编程都处理数据条目。正则表达式在当今的应用程序编程中几乎无处不在。
- 正则表达式有助于验证程序员感兴趣的文本字符串。
- 它为分析、搜索模式和修改文本数据提供了强大的工具。
- 它有助于以灵活的方式搜索特定的字符串模式并提取匹配结果。
- 它有助于解析文本文件,寻找定义的字符序列以进行进一步分析或数据操作。
- 借助内置的正则表达式函数,为识别模式提供了简单而简单的解决方案。
- 它有效地节省了大量的开发时间,即寻找特定的字符串模式。
- 它有助于重要的用户信息验证,如电子邮件地址、电话号码和 IP 地址。
- 它有助于根据搜索结果或输入突出显示文件中的特殊关键字。
- 它有助于识别特定的模板标签并根据要求将这些数据替换为实际数据。
- 正则表达式对于创建 HTML 模板系统识别标签非常有用。
- 正则表达式主要用于浏览器检测、垃圾邮件过滤、检查密码强度和表单验证。
我们无法涵盖该主题下的所有内容,但让我们研究一些主要的正则表达式概念。下表显示了一些正则表达式以及与正则表达式模式匹配的相应字符串。
Regular Expression | Matches |
---|---|
geeks | The string “geeks” |
^geeks | The string which starts with “geeks” |
geeks$ | The string which have “geeks” at the end. |
^geeks$ | The string where “geeks” is alone on a string. |
[abc] | a, b, or c |
[a-z] | Any lowercase letter |
[^A-Z] | Any letter which is NOT a uppercase letter |
(gif|png) | Either “gif” or “png” |
[a-z]+ | One or more lowercase letters |
^[a-zA-Z0-9]{1, }$ | Any word with at least one number or one letter |
([ax])([by]) | ab, ay, xb, xy |
[^A-Za-z0-9] | Any symbol other than a letter or other than number |
([A-Z]{3}|[0-9]{5}) | Matches three letters or five numbers |
注意:可以通过应用一些基本的正则表达式规则来创建复杂的搜索模式。甚至许多算术运算运算符(如 +、^、-)也被正则表达式用于创建小复杂模式。
正则表达式中的运算符:让我们看看PHP正则表达式中的一些运算符。
Operator | Description |
---|---|
^ | It denotes the start of string. |
$ | It denotes the end of string. |
. | It denotes almost any single character. |
() | It denotes a group of expressions. |
[] | It finds a range of characters for example [xyz] means x, y or z . |
[^] | It finds the items which are not in range for example [^abc] means NOT a, b or c. |
– (dash) | It finds for character range within the given item range for example [a-z] means a through z. |
| (pipe) | It is the logical OR for example x | y means x OR y. |
? | It denotes zero or one of preceding character or item range. |
* | It denotes zero or more of preceding character or item range. |
+ | It denotes one or more of preceding character or item range. |
{n} | It denotes exactly n times of preceding character or item range for example n{2}. |
{n, } | It denotes atleast n times of preceding character or item range for example n{2, }. |
{n, m} | It denotes atleast n but not more than m times for example n{2, 4} means 2 to 4 of n. |
\ | It denotes the escape character. |
正则表达式中的特殊字符类:让我们看看正则表达式中使用的一些特殊字符。
Special Character | Meaning |
---|---|
\n | It denotes a new line. |
\r | It denotes a carriage return. |
\t | It denotes a tab. |
\v | It denotes a vertical tab. |
\f | It denotes a form feed. |
\xxx | It denotes octal character xxx. |
\xhh | It denotes hex character hh. |
速记字符集:让我们看看一些可用的速记字符集。
Shorthand | Meaning |
---|---|
\s | Matches space characters like space, newline or tab. |
\d | Matches any digit from 0 to 9. |
\w | Matches word characters including all lower and upper case letters, digits and underscore. |
预定义函数或正则表达式库:让我们看一下PHP中正则表达式的预定义函数的快速备忘单。 PHP为程序员提供了许多有用的函数来处理正则表达式。
下面列出的内置函数区分大小写。
Function | Definition |
---|---|
preg_match() | This function searches for a specific pattern against some string. It returns true if pattern exists and false otherwise. |
preg_match_all() | This function searches for all the occurrences of string pattern against the string. This function is very useful for search and replace. |
ereg_replace() | This function searches for specific string pattern and replace the original string with the replacement string, if found. |
eregi_replace() | The function behaves like ereg_replace() provided the search for pattern is not case sensitive. |
preg_replace() | This function behaves like ereg_replace() function provided the regular expressions can be used in the pattern and replacement strings. |
preg_split() | The function behaves like the PHP split() function. It splits the string by regular expressions as its parameters. |
preg_grep() | This function searches all elements which matches the regular expression pattern and returns the output array. |
preg_quote() | This function takes string and quotes in front of every character which matches the regular expression. |
ereg() | This function searches for a string which is specified by a pattern and returns true if found, otherwise returns false. |
eregi() | This function behaves like ereg() function provided the search is not case sensitive. |
笔记:
- 默认情况下,正则表达式区分大小写。
- PHP中单引号内的字符串和双引号内的字符串是有区别的。前者按字面意思处理,而双引号内的字符串意味着打印变量的内容,而不仅仅是打印它们的名称。
示例 1:
输出:
Name string matching with regular expression
示例 2:
(.*)<\/b>/U";
// Declare a string
$inputString = "Name: John Position: Developer";
// Use preg_match_all() function to perform
// a global regular expression match
preg_match_all($regex, $inputString, $output);
echo $output[0][0]."
".$output[0][1]."\n";
?>
输出:
John
Developer
示例 3:
输出:
Completed graduation in 2002
示例 4:
";
echo "$output[1]
";
echo "$output[2]
";
echo "$output[3]
";
?>
输出:
134
645
478
670
元字符:正则表达式中使用了两种字符:正则字符和元字符。正则字符是那些字符“字面量”含义的字符,元字符是那些在正则表达式中具有“特殊”含义的字符。
Metacharacter | Description | Example |
---|---|---|
. | It matches any single character other than a new line. | /./ matches string which has a single character. |
^ | It matches the beginning of string. | /^geeks/ matches any string that starts with geeks. |
$ | It matches the string pattern at the end of the string. | /com$/ matches string ending with com for example google.com etc. |
* | It matches zero or more characters. | /com*/ matches commute, computer, compromise etc. |
+ | It matches preceding character appear atleast once. | For example /z+oom/ matches zoom. |
\ | It is used to esacape metacharacters in regex. | /google\.com/ will treat the dot as a literal value, not metacharacter. |
a-z | It matches lower case letters. | geeks |
A-Z | It matches upper case letters. | GEEKS |
0-9 | It matches any number between 0 and 9. | /0-5/ matches 0, 1, 2, 3, 4, 5 |
[…] | It matches character class. | /[pqr]/ matches pqr |
其他示例:
Regular expression | Meaning |
---|---|
^[.-a-z0-9A-Z] | It matches string with dot, dash and any lower case letters, numbers between 0 and 9 and upper case letters. |
+@[a-z0-9A-Z] | It matches string with @ symbol in the beginning followed by any lower case letters, numbers between 0 and 9 and upper case letters. |
+\.[a-z]{2, 6}$/ | It escapes the dot and then matches string with any lower case letters with string length between 2 and 6 at the end. |
笔记:
- 元字符在正则表达式模式匹配解决方案中非常强大。它处理许多复杂的模式处理。
- 每个不是元字符的字符都绝对是常规字符。
- 每个常规字符本身都匹配同一个字符。
POSIX 正则表达式: PHP中的一些正则表达式类似于算术表达式,称为 POSIX 正则表达式。有时,通过组合正则表达式中的各种元素或运算符来创建复杂的表达式。非常基本的正则表达式是匹配单个字符的。
让我们看看一些 POSIX 正则表达式。
Regex | Meaning |
---|---|
[0-9] | It matches digit from 0 through 9. |
[a-z] | It matches any lowercase letter from a through z. |
[A-Z] | It matches any uppercase letter from A through Z. |
[a-Z] | It matches any lowercase letter a through uppercase letter Z. |
[:lower:] | It matches any lower case letters. |
[:upper:] | It matches any upper case letters. |
[:alpha:] | It matches all alphabetic characters or letters from a-z and A-Z. |
[[:alpha:]] | It matches any string containing alphabetic characters or letters. |
[:alnum:] | It matches all alphanumeric characters i.e all digits(0-9) and letters (a-z A-Z). |
[[:alnum:]] | It matches any string containing alphanumeric characters and digits. |
[:digit:] | It matches all the digits from 0 to 9. |
[[:digit:]] | It matches any string containing digits from 0 to 9. |
[:xdigit:] | It matches all the hexadecimal digits. |
[:punct:] | It matches all the punctuation symbols. |
[:blank:] | It matches blank characters like space and tab. |
[:space:] | It matches all whitespace characters like line breaks. |
[[:space:]] | It matches any string containing a space. |
[:cntrl:] | It matches all control characters. |
[:graph:] | It matches all visible or printed characters other than spaces and control characters. |
[:print:] | It matches all printed characters and spaces other than control characters. |
[:word:] | It matches all word characters like digits, letters and underscore. |
正则表达式中的量词:量词是特殊字符,它告诉括号字符或字符组的实例或出现的数量、频率或数量。它们也被称为贪婪和懒惰的表达式。让我们来看看量词的一些概念和例子。
Quantifier | Meaning |
---|---|
a+ | It matches the string containing at least one a. |
a* | It matches the string containing zero or more a’s. |
a? | It matches any string containing zero or one a’s. |
a{x} | It matches letter ‘a’ exaclty x times . |
a{2, 3} | It matches any string containing the occurrence of two or three a’s. |
a{2, } | It matches any string containing the occurrence of at least two a’s. |
a{2} | It matches any string containing the occurrence of exactly two a’s. |
a{, y} | It matches any string containing the occurrence of not more than y number of a’s. |
a$ | It matches any string with ‘a’ at the end of it. |
^a | It matches any string with ‘a’ at the beginning of it. |
[^a-zA-Z] | It matches any string pattern not having characters from a to z and A to Z. |
a.a | It matches any string pattern containing a, then any character and then another a. |
^.{3}$ | It matches any string having exactly three characters. |
笔记:
- 表达式中的 $字符将匹配目标字符串的结尾。
- 正则表达式中的 *、?、+ 符号表示字符出现的频率。如果它发生零次或多次,零次或一次,一次或多次。
- 表达式中的 ^字符将匹配目标字符串的开头。
- 这 。元字符匹配除换行符以外的任何单个字符。
常见的正则表达式引擎:
- 正则表达式
- 正则表达式好友
结论:正则表达式是以特定模式描述某些字符串文本的模式,或者它被定义为以单行表示的模式匹配算法。正则表达式在编程世界中对于验证检查和识别特定模板非常有用。 PHP提供了许多支持正则表达式的内置函数。元字符有助于创建复杂的模式。