Unix / Linux-带SED的正则表达式

📌 相关文章

📜 Unix / Linux-带SED的正则表达式

📅 最后修改于: 2020-10-31 14:57:59 🧑 作者: Mango

在本章中，我们将详细讨论Unix中使用SED的正则表达式。

正则表达式是一个字符串，可用于描述几个字符序列。正则表达式由几个不同的Unix命令使用，包括ed ， sed ， awk ， grep以及在更有限的范围内vi 。

这里SED代表对于s tream版itor。这个面向流的编辑器是专门为执行脚本而创建的。因此，您输入的所有输入都会通过并进入STDOUT，并且不会更改输入文件。

调用sed

在开始之前，请确保我们具有/ etc / passwd文本文件的本地副本以与sed一起使用。

如前所述，可以通过如下方式通过管道向其发送数据来调用sed：

$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-other-script} [input-file]...

  -n, --quiet, --silent
                 suppress automatic printing of pattern space
  -e script, --expression = script
...............................

cat命令转储的内容的/ etc / passwd来SED通过管道进入的sed的模式空间。模式空间是sed用于其操作的内部工作缓冲区。

sed通用语法

以下是sed的一般语法-

/pattern/action

在这里， pattern是一个正则表达式，而action是下表中给出的命令之一。如果省略了pattern ，则如上所述，将对每一行执行操作。

包围模式的斜杠字符(/)是必需的，因为它们用作分隔符。

Sr.No.	Range & Description
1	p Prints the line
2	d Deletes the line
3	s/pattern1/pattern2/ Substitutes the first occurrence of pattern1 with pattern2

Sr.No.

Range & Description

Prints the line

Deletes the line

s/pattern1/pattern2/

Substitutes the first occurrence of pattern1 with pattern2

用sed删除所有行

现在，我们将了解如何删除sed的所有行。再次调用sed；但是sed现在应该使用编辑命令delete行，以单字母d-表示

$ cat /etc/passwd | sed 'd'
$

代替通过管道将文件发送给sed的方法，可以指示sed从文件中读取数据，如以下示例所示。

以下命令与上一个示例完全相同，但不带cat命令-

$ sed -e 'd' /etc/passwd
$

sed地址

sed还支持地址。地址可以是文件中的特定位置，也可以是应应用特定编辑命令的范围。当sed没有地址时，它将在文件中的每一行上执行其操作。

以下命令将基本地址添加到您一直在使用的sed命令中-

$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

请注意，数字1已添加到delete edit命令之前。这指示sed在文件的第一行执行编辑命令。在此示例中，sed将删除/ etc / password的第一行，并打印文件的其余部分。

sed地址范围

现在，我们将了解如何使用sed地址范围。那么，如果要从文件中删除多行呢?您可以使用sed指定地址范围，如下所示-

$ cat /etc/passwd | sed '1, 5d' |more
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

上面的命令将应用于从1到5的所有行。这将删除前五行。

尝试以下地址范围-

Sr.No.	Range & Description
1	‘4,10d’ Lines starting from the 4^th till the 10^th are deleted
2	‘10,4d’ Only 10^th line is deleted, because the sed does not work in reverse direction
3	‘4,+5d’ This matches line 4 in the file, deletes that line, continues to delete the next five lines, and then ceases its deletion and prints the rest
4	‘2,5!d’ This deletes everything except starting from 2^nd till 5^th line
5	‘1~3d’ This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues to apply this pattern until the end of the file.
6	‘2~2d’ This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached
7	‘4,10p’ Lines starting from 4^th till 10^th are printed
8	‘4,d’ This generates the syntax error
9	‘,10d’ This would also generate syntax error

注意-使用p操作时，应使用-n选项以避免重复行打印。检查以下两个命令之间的区别-

$ cat /etc/passwd | sed -n '1,3p'
Check the above command without -n as follows −
$ cat /etc/passwd | sed '1,3p'

换人命令

在替换命令，用s表示，将取代你与你指定的任何其他字符串指定的任意字符串。

要用一个字符串替换另一个字符串，sed需要具有有关第一个字符串结束和替换字符串开始的位置的信息。为此，我们继续使用正斜杠( / )字符预订两个字符串。

以下命令用字符串amrood替换字符串root的一行上的第一个匹配项。

$ cat /etc/passwd | sed 's/root/amrood/'
amrood:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
..........................

重要的是要注意，sed仅替换行中的第一个匹配项。如果字符串根在一行中出现多次，则仅第一个匹配项将被替换。

为了使sed执行全局替换，请在命令末尾添加字母g ，如下所示：

$ cat /etc/passwd | sed 's/root/amrood/g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
...........................

替代标志

除了g标志之外，还可以传递许多其他有用的标志，并且您一次可以指定多个。

Sr.No.	Flag & Description
1	g Replaces all matches, not just the first match
2	NUMBER Replaces only NUMBER^th match
3	p If substitution was made, then prints the pattern space
4	w FILENAME If substitution was made, then writes result to FILENAME
5	I or i Matches in a case-insensitive manner
6	M or m In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and $ to match the empty string before a newline

使用备用字符串分隔符

假设您必须对包含正斜杠字符的字符串进行替换。在这种情况下，可以通过在s之后提供指定的字符来指定其他分隔符。

$ cat /etc/passwd | sed 's:/root:/amrood:g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

在上面的示例中，我们使用：作为分隔符，而不是斜杠/，因为我们试图搜索/ root而不是简单的root。

替换为空白空间

使用空的替换字符串从/ etc / passwd文件中完全删除根字符串-

$ cat /etc/passwd | sed 's/root//g'
:x:0:0::/:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh

地址替换

如果只想在第10行将字符串sh替换为安静的字符串，则可以按以下方式指定它：

$ cat /etc/passwd | sed '10s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/quiet

同样，要进行地址范围替换，您可以执行以下操作-

$ cat /etc/passwd | sed '1,5s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/quiet
daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
bin:x:2:2:bin:/bin:/bin/quiet
sys:x:3:3:sys:/dev:/bin/quiet
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

从输出中可以看到，前五行将字符串sh更改为quiet ，但其余各行保持不变。

匹配命令

您将使用p选项和-n选项来打印所有匹配的行，如下所示：

$ cat testing | sed -n '/root/p'
root:x:0:0:root user:/root:/bin/sh
[root@ip-72-167-112-17 amrood]# vi testing
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

使用正则表达式

在匹配模式时，您可以使用正则表达式来提供更大的灵活性。

检查以下示例，该示例匹配以daemon开头的所有行，然后将其删除-

$ cat testing | sed '/^daemon/d'
root:x:0:0:root user:/root:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh

以下是删除所有以sh结尾的行的示例-

$ cat testing | sed '/sh$/d'
sync:x:4:65534:sync:/bin:/bin/sync

下表列出了四个在正则表达式中非常有用的特殊字符。

Sr.No.	Character & Description
1	^ Matches the beginning of lines
2	$ Matches the end of lines
3	. Matches any single character
4	* Matches zero or more occurrences of the previous character
5	[chars] Matches any one of the characters given in chars, where chars is a sequence of characters. You can use the – character to indicate a range of characters.

匹配字符

再看一些表达式来演示元字符的使用。例如，以下模式-

Sr.No.	Expression & Description
1	/a.c/ Matches lines that contain strings such as a+c, a-c, abc, match, and a3c
2	*/ac/ Matches the same strings along with strings such as ace, yacc, and arctic**
3	/[tT]he/ Matches the string The and the
4	/^$/ Matches blank lines
5	*/^.$/** Matches an entire line whatever it is
6	*/ /** Matches one or more spaces
7	/^$/ Matches blank lines

下表显示了一些常用的字符集-

Sr.No.	Set & Description
1	[a-z] Matches a single lowercase letter
2	[A-Z] Matches a single uppercase letter
3	[a-zA-Z] Matches a single letter
4	[0-9] Matches a single number
5	[a-zA-Z0-9] Matches a single letter or number

字符类别关键字

正则表达式通常可以使用一些特殊的关键字，尤其是采用正则表达式的GNU实用程序。这些对于sed正则表达式非常有用，因为它们可以简化事情并增强可读性。

例如，字符a到z和字符A到Z构成了这类具有关键字[[：alpha：]]的字符。

使用字母字符类关键字，此命令仅打印/etc/syslog.conf文件中以字母开头的行-

$ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
authpriv.*                         /var/log/secure
mail.*                             -/var/log/maillog
cron.*                             /var/log/cron
uucp,news.crit                     /var/log/spooler
local7.*                           /var/log/boot.log

下表是GNU sed中可用字符类关键字的完整列表。

Sr.No.	Character Class & Description
1	[[:alnum:]] Alphanumeric [a-z A-Z 0-9]
2	[[:alpha:]] Alphabetic [a-z A-Z]
3	[[:blank:]] Blank characters (spaces or tabs)
4	[[:cntrl:]] Control characters
5	[[:digit:]] Numbers [0-9]
6	[[:graph:]] Any visible characters (excludes whitespace)
7	[[:lower:]] Lowercase letters [a-z]
8	[[:print:]] Printable characters (non-control characters)
9	[[:punct:]] Punctuation characters
10	[[:space:]] Whitespace
11	[[:upper:]] Uppercase letters [A-Z]
12	[[:xdigit:]] Hex digits [0-9 a-f A-F]

＆引用

sed元字符＆表示匹配的模式的内容。例如，假设您有一个名为phone.txt的文件，其中包含完整的电话号码，例如以下内容-

您希望将区号(前三位数字)用括号括起来以便于阅读。为此，您可以使用“＆”号替换字符-

$ sed -e 's/^[[:digit:]][[:digit:]][[:digit:]]/(&)/g' phone.txt
(555)5551212
(555)5551213
(555)5551214
(666)5551215

(666)5551216
(777)5551217

在模式部分中，您要匹配前3位数字，然后使用＆将3位数字替换为周围的括号。

使用多个sed命令

您可以在单个sed命令中使用多个sed命令，如下所示-

$ sed -e 'command1' -e 'command2' ... -e 'commandN' files

在这里， command1到commandN是前面讨论的类型的sed命令。这些命令应用于文件给定的文件列表中的每一行。

使用相同的机制，我们可以编写上面的电话号码示例，如下所示：

$ sed -e 's/^[[:digit:]]\{3\}/(&)/g'  \ 
   -e 's/)[[:digit:]]\{3\}/&-/g' phone.txt 
(555)555-1212 
(555)555-1213 
(555)555-1214 
(666)555-1215 
(666)555-1216 
(777)555-1217

注意-在上面的示例中，我们没有用\ {3 \}替换字符类关键字[[：digit：]]三次，而是将其替换为\ {3 \} ，这意味着前面的正则表达式被匹配了3次。我们还使用\给出了换行符，必须在运行命令之前将其删除。

反向参考

“ ＆ ”符号是有用的，但更有用的是在正则表达式中定义特定区域的功能。这些特殊区域可以用作替换字符串的参考。通过定义正则表达式的特定部分，您可以使用特殊的参考字符引用这些部分。

要回溯引用，您必须先定义一个区域，然后再返回该区域。要定义区域，请在每个感兴趣的区域周围插入反斜杠。然后，您用反斜杠包围的第一个区域由\ 1引用，第二个区域由\ 2引用，依此类推。

假设phone.txt具有以下文本-

(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217

尝试以下命令-

$ cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area \ 
   code: \1 Second: \2 Third: \3/' 
Area code: (555) Second: 555- Third: 1212 
Area code: (555) Second: 555- Third: 1213 
Area code: (555) Second: 555- Third: 1214 
Area code: (666) Second: 555- Third: 1215 
Area code: (666) Second: 555- Third: 1216 
Area code: (777) Second: 555- Third: 1217

注–在上面的示例中，括号内的每个正则表达式将被\ 1 ， \ 2等反向引用。我们在这里使用\来换行。在运行命令之前，应将其删除。