Unix 简明教程
Unix / Linux - Regular Expressions with SED
在本篇教程中,我们将详细讨论 Unix 中的 SED 正则表达式。
In this chapter, we will discuss in detail about regular expressions with SED in Unix.
正则表达式是一个可以用来描述多个字符序列的字符串。包括 ed 、 sed 、 awk 、 grep 等许多不同的 Unix 命令都使用了正则表达式,在一定程度上甚至包括 vi 。
A regular expression is a string that can be used to describe several sequences of characters. Regular expressions are used by several different Unix commands, including ed, sed, awk, grep, and to a more limited extent, vi.
此处 SED 代表 *s*tream *ed*itor。此流定向编辑器完全用于执行脚本。因此,你输入其中的所有内容都将经过处理并传递到 STDOUT,并且它不会改变输入文件。
Here SED stands for *s*tream *ed*itor. This stream-oriented editor was created exclusively for executing scripts. Thus, all the input you feed into it passes through and goes to STDOUT and it does not change the input file.
Invoking sed
在正式开始之前,我们先确保有一份 /etc/passwd 文本文件的本地副本,以便进行练习 sed 。
Before we start, let us ensure we have a local copy of /etc/passwd text file to work with sed.
如前文所述,通过管道向 sed 发送数据就可以调用它,如下所示——
As mentioned previously, sed can be invoked by sending data through a pipe to it as follows −
$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-other-script} [input-file]...
-n, --quiet, --silent
suppress automatic printing of pattern space
-e script, --expression = script
...............................
cat 命令通过管道将 /etc/passwd 的内容转储到 sed ,再转储到 sed 的模式空间。模式空间是 sed 用于执行操作的内部工作缓冲区。
The cat command dumps the contents of /etc/passwd to sed through the pipe into sed’s pattern space. The pattern space is the internal work buffer that sed uses for its operations.
The sed General Syntax
以下是 sed 的通用语法——
Following is the general syntax for sed −
/pattern/action
此处, pattern 是正则表达式,而 action 是下表中给出的一个命令。如果 pattern 被忽略,则对每一行执行 action ,如我们在上面看到的那样。
Here, pattern is a regular expression, and action is one of the commands given in the following table. If pattern is omitted, action is performed for every line as we have seen above.
把模式包围起来的斜杠 (/)是必需的,因为它们用作分隔符。
The slash character (/) that surrounds the pattern are required because they are used as delimiters.
Sr.No. |
Range & Description |
1 |
p Prints the line |
2 |
d Deletes the line |
3 |
s/pattern1/pattern2/ Substitutes the first occurrence of pattern1 with pattern2 |
Deleting All Lines with sed
现在我们将了解如何使用 sed 删除所有行。再次调用 sed;但是,现在 sed 应该使用 editing command delete line ,该字符用字母 d - 表示
We will now understand how to delete all lines with sed. Invoke sed again; but the sed is now supposed to use the editing command delete line, denoted by the single letter d −
$ cat /etc/passwd | sed 'd'
$
不需要通过管道向 sed 发送文件即可调用 sed,可以指示 sed 从文件读取数据,如下面的示例所示。
Instead of invoking sed by sending a file to it through a pipe, the sed can be instructed to read the data from a file, as in the following example.
以下命令与前一个示例中的命令完全相同,不需要 cat 命令:
The following command does exactly the same as in the previous example, without the cat command −
$ sed -e 'd' /etc/passwd
$
The sed Addresses
sed 还支持地址。地址要么是文件中的特定位置,要么是应该应用特定编辑命令的范围。当 sed 遇到没有地址时,它会在文件中的每一行执行操作。
The sed also supports addresses. Addresses are either particular locations in a file or a range where a particular editing command should be applied. When the sed encounters no addresses, it performs its operations on every line in the file.
以下命令向您已在使用的 sed 命令添加了基本地址:
The following command adds a basic address to the sed command you’ve been using −
$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$
请注意,该 delete edit 命令前添加了数字 1。这指示 sed 对文件的第 1 行执行编辑命令。在此示例中,sed 将删除 /etc/password 的第一行并打印文件的其余部分。
Notice that the number 1 is added before the delete edit command. This instructs the sed to perform the editing command on the first line of the file. In this example, the sed will delete the first line of /etc/password and print the rest of the file.
The sed Address Ranges
现在我们将了解如何使用 the sed address ranges 。那么,如果您想从文件中移除多行,该怎么办?您可以用 sed 指定一个地址范围,如下所示:
We will now understand how to work with the sed address ranges. So what if you want to remove more than one line from a file? You can specify an address range with sed as follows −
$ cat /etc/passwd | sed '1, 5d' |more
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$
上述命令将应用于从 1 到 5 开始的所有行。这会删除前五行。
The above command will be applied on all the lines starting from 1 through 5. This deletes the first five lines.
尝试以下地址范围:
Try out the following address ranges −
Sr.No. |
Range & Description |
1 |
'4,10d' Lines starting from the 4th till the 10th are deleted |
2 |
'10,4d' Only 10th line is deleted, because the sed does not work in reverse direction |
3 |
'4,+5d' This matches line 4 in the file, deletes that line, continues to delete the next five lines, and then ceases its deletion and prints the rest |
4 |
'2,5!d' This deletes everything except starting from 2nd till 5th line |
5 |
'1~3d' This deletes the first line, steps over the next three lines, and then deletes the fourth line. Sed continues to apply this pattern until the end of the file. |
6 |
'2~2d' This tells sed to delete the second line, step over the next line, delete the next line, and repeat until the end of the file is reached |
7 |
'4,10p' Lines starting from 4th till 10th are printed |
8 |
'4,d' This generates the syntax error |
9 |
',10d' This would also generate syntax error |
Note − 在使用 p 操作时,您应该使用 -n 选项来避免重复打印行。查看以下两个命令之间的差异 −
Note − While using the p action, you should use the -n option to avoid repetition of line printing. Check the difference in between the following two commands −
$ cat /etc/passwd | sed -n '1,3p'
Check the above command without -n as follows −
$ cat /etc/passwd | sed '1,3p'
The Substitution Command
由 s 表示的替换命令将用您指定的任何其他字符串替换您指定的任何字符串。
The substitution command, denoted by s, will substitute any string that you specify with any other string that you specify.
为了用一个字符串替换另一个字符串,sed 需要有关第一个字符串结束位置和替换字符串开始位置的信息。为此,我们使用正斜杠 ( / ) 字符将这两个字符串作为书挡。
To substitute one string with another, the sed needs to have the information on where the first string ends and the substitution string begins. For this, we proceed with bookending the two strings with the forward slash (/) character.
以下命令用字符串 amrood 替换一行中字符串 root 的首次出现。
The following command substitutes the first occurrence on a line of the string root with the string amrood.
$ cat /etc/passwd | sed 's/root/amrood/'
amrood:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
..........................
需要注意的是,sed 仅替换一行中的首次出现。如果字符串 root 在一行中出现多次,则仅替换第一个匹配项。
It is very important to note that sed substitutes only the first occurrence on a line. If the string root occurs more than once on a line only the first match will be replaced.
为了让 sed 执行全局替换,请按以下方式将字母 g 添加到命令的末尾 −
For the sed to perform a global substitution, add the letter g to the end of the command as follows −
$ cat /etc/passwd | sed 's/root/amrood/g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
...........................
Substitution Flags
除了 g 标志外,还可以传递许多其他有用的标志,并且您一次可以指定多个标志。
There are a number of other useful flags that can be passed in addition to the g flag, and you can specify more than one at a time.
Sr.No. |
Flag & Description |
1 |
g Replaces all matches, not just the first match |
2 |
NUMBER Replaces only NUMBERth match |
3 |
p If substitution was made, then prints the pattern space |
4 |
w FILENAME If substitution was made, then writes result to FILENAME |
5 |
I or i Matches in a case-insensitive manner |
6 |
M or m In addition to the normal behavior of the special regular expression characters ^ and $, this flag causes ^ to match the empty string after a newline and $ to match the empty string before a newline |
Using an Alternative String Separator
假设您必须对包含正斜杠字符的字符串执行替换。在这种情况下,您可以通过在 s 后提供指定字符来指定不同的分隔符。
Suppose you have to do a substitution on a string that includes the forward slash character. In this case, you can specify a different separator by providing the designated character after the s.
$ cat /etc/passwd | sed 's:/root:/amrood:g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
在上面的示例中,我们使用了 : 作为 delimiter 而不是斜杠 /,因为我们试图搜索 /root 而不是简单的 root。
In the above example, we have used : as the delimiter instead of slash / because we were trying to search /root instead of the simple root.
Replacing with Empty Space
使用空替换字符串从 /etc/passwd 文件中完全删除 root 字符串 −
Use an empty substitution string to delete the root string from the /etc/passwd file entirely −
$ cat /etc/passwd | sed 's/root//g'
:x:0:0::/:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Address Substitution
如果您只想在第 10 行上用字符串 quiet 替换字符串 sh ,则可以按如下方式指定:
If you want to substitute the string sh with the string quiet only on line 10, you can specify it as follows −
$ cat /etc/passwd | sed '10s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/quiet
类似地,要进行地址范围替换,您可以执行以下操作:
Similarly, to do an address range substitution, you could do something like the following −
$ cat /etc/passwd | sed '1,5s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/quiet
daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
bin:x:2:2:bin:/bin:/bin/quiet
sys:x:3:3:sys:/dev:/bin/quiet
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
正如您从输出中看到的,前五行的字符串 sh 已更改为 quiet ,但其余行保持不变。
As you can see from the output, the first five lines had the string sh changed to quiet, but the rest of the lines were left untouched.
The Matching Command
您将使用 p 选项和 -n 选项来打印所有匹配行,如下所示:
You would use the p option along with the -n option to print all the matching lines as follows −
$ cat testing | sed -n '/root/p'
root:x:0:0:root user:/root:/bin/sh
[root@ip-72-167-112-17 amrood]# vi testing
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
Using Regular Expression
在匹配模式时,可以使用正则表达式,它提供更大的灵活性。
While matching patterns, you can use the regular expression which provides more flexibility.
检查以下示例,它匹配以 daemon 开头的所有行,然后删除它们 −
Check the following example which matches all the lines starting with daemon and then deletes them −
$ cat testing | sed '/^daemon/d'
root:x:0:0:root user:/root:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
以下是删除所有以 sh 结尾的行的示例 −
Following is the example which deletes all the lines ending with sh −
$ cat testing | sed '/sh$/d'
sync:x:4:65534:sync:/bin:/bin/sync
下表列出了四个在正则表达式中非常有用的特殊字符。
The following table lists four special characters that are very useful in regular expressions.
Sr.No. |
Character & Description |
1 |
^ Matches the beginning of lines |
2 |
$ Matches the end of lines |
3 |
. Matches any single character |
4 |
* Matches zero or more occurrences of the previous character |
5 |
[chars] Matches any one of the characters given in chars, where chars is a sequence of characters. You can use the - character to indicate a range of characters. |
Matching Characters
再看一些表达式,以演示 metacharacters 的用法。例如,以下模式 −
Look at a few more expressions to demonstrate the use of metacharacters. For example, the following pattern −
Sr.No. |
Expression & Description |
1 |
/a.c/ Matches lines that contain strings such as a+c, a-c, abc, match, and a3c |
2 |
/a*c/ Matches the same strings along with strings such as ace, yacc, and arctic |
3 |
/[tT]he/ Matches the string The and the |
4 |
/^$/ Matches blank lines |
5 |
/^.$/* Matches an entire line whatever it is |
6 |
/ */ Matches one or more spaces |
7 |
/^$/ Matches blank lines |
下表显示了一些常用字符集 −
Following table shows some frequently used sets of characters −
Sr.No. |
Set & Description |
1 |
[a-z] Matches a single lowercase letter |
2 |
[A-Z] Matches a single uppercase letter |
3 |
[a-zA-Z] Matches a single letter |
4 |
[0-9] Matches a single number |
5 |
[a-zA-Z0-9] Matches a single letter or number |
Character Class Keywords
某些特殊关键字通常可用于 regexps ,尤其是使用 regexps 的 GNU 实用程序。这些对于 sed 正则表达式非常有用,因为它们简化了操作并增强了可读性。
Some special keywords are commonly available to regexps, especially GNU utilities that employ regexps. These are very useful for sed regular expressions as they simplify things and enhance readability.
For example, the characters a through z and the characters A through Z, constitute one such class of characters that has the keyword [id=":alpha:"]
使用字母字符类关键字,此命令只打印 /etc/syslog.conf 文件中以字母开头的行 −
Using the alphabet character class keyword, this command prints only those lines in the /etc/syslog.conf file that start with a letter of the alphabet −
$ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
authpriv.* /var/log/secure
mail.* -/var/log/maillog
cron.* /var/log/cron
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log
下表是 GNU sed 中可用字符类关键字的完整列表。
The following table is a complete list of the available character class keywords in GNU sed.
Sr.No. |
Character Class & Description |
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
[id=":print:"] Printable characters (non-control characters) |
9 |
|
10 |
|
11 |
|
12 |
Aampersand Referencing
sed metacharacter & 表示匹配的模式的内容。例如,假设你有一个名为 phone.txt 的文件,其中包含许多电话号码,如下所示 −
The sed metacharacter & represents the contents of the pattern that was matched. For instance, say you have a file called phone.txt full of phone numbers, such as the following −
5555551212
5555551213
5555551214
6665551215
6665551216
7775551217
你想将 area code (前三个数字)括在括号中以方便阅读。为此,你可以使用替换字符“&” −
You want to make the area code (the first three digits) surrounded by parentheses for easier reading. To do this, you can use the ampersand replacement character −
$ sed -e 's/^[[:digit:]][[:digit:]][[:digit:]]/(&)/g' phone.txt
(555)5551212
(555)5551213
(555)5551214
(666)5551215
(666)5551216
(777)5551217
在模式部分,你匹配前 3 个数字,然后使用 & 来用 parentheses 包围替换这 3 个数字。
Here in the pattern part you are matching the first 3 digits and then using & you are replacing those 3 digits with the surrounding parentheses.
Using Multiple sed Commands
你可以使用以下方法在单个 sed 命令中使用多个 sed 命令 −
You can use multiple sed commands in a single sed command as follows −
$ sed -e 'command1' -e 'command2' ... -e 'commandN' files
此处 command1 到 commandN 为前面讨论过的类型 sed 命令。这些命令应用于给定文件中列表中的每一行。
Here command1 through commandN are sed commands of the type discussed previously. These commands are applied to each of the lines in the list of files given by files.
使用相同机制,我们可以如下编写上面的电话号码示例 −
Using the same mechanism, we can write the above phone number example as follows −
$ sed -e 's/^[[:digit:]]\{3\}/(&)/g' \
-e 's/)[[:digit:]]\{3\}/&-/g' phone.txt
(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217
Back References
ampersand metacharacter 很有用,但更重要的是能够定义正则表达式中的特定区域。这些特殊区域可以用作替换字符串中的参考。通过定义正则表达式的特定部分,你可以使用特殊参考字符引用这些部分。
The ampersand metacharacter is useful, but even more useful is the ability to define specific regions in regular expressions. These special regions can be used as reference in your replacement strings. By defining specific parts of a regular expression, you can then refer back to those parts with a special reference character.
要执行 back references ,你必须先定义一个区域,然后引用该区域。要定义一个区域,你可以在每个感兴趣的区域周围插入 backslashed parentheses 。你用反斜杠包围的第一个区域由 \1 引用,第二个区域由 \2 引用,依此类推。
To do back references, you have to first define a region and then refer back to that region. To define a region, you insert backslashed parentheses around each region of interest. The first region that you surround with backslashes is then referenced by \1, the second region by \2, and so on.
假设 phone.txt 具有以下文本 −
Assuming phone.txt has the following text −
(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217
尝试以下命令 −
Try the following command −
$ cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area \
code: \1 Second: \2 Third: \3/'
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (666) Second: 555- Third: 1216
Area code: (777) Second: 555- Third: 1217
Note − 在上述示例中,括号内的每个正则表达式都将由 \1 、 \2 等进行反向引用。我们在此处使用 \ 来换行。在运行命令之前,应该将其删除。
Note − In the above example, each regular expression inside the parenthesis would be back referenced by \1, \2 and so on. We have used \ to give line break here. This should be removed before running the command.