Linux 中的 join 命令

UNIX 中的 join 命令是一个命令行实用程序，用于在公共字段上连接两个文件的行。

假设您有两个文件，并且需要以输出更有意义的方式组合这两个文件。例如，可能有一个文件包含名称，另一个包含 ID，并且要求将两个文件合并到这样名称和相应的 ID 出现在同一行中。 join命令是它的工具。 join命令用于根据两个文件中存在的关键字段连接两个文件。输入文件可以用空格或任何分隔符分隔。
句法：

$join [OPTION] FILE1 FILE2

示例：假设有两个文件 file1.txt 和 file2.txt，我们想合并这两个文件的内容。

// displaying the contents of first file //
$cat file1.txt
1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK

// displaying contents of second file //
$cat file2.txt
1 101
2 102
3 103
4 104

现在，为了合并两个文件，文件必须有一些公共字段。在这种情况下，我们将编号 1、2... 作为两个文件中的公共字段。

注意：当使用 join 命令时，两个输入文件都应该在我们要加入文件的 KEY 上排序。

//..using join command...//
$join file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104

// by default join command takes the 
first column as the key to join as 
in the above case //

因此，输出包含键，后跟第一个文件 file1.txt 中的所有匹配列，然后是第二个文件 file2.txt 的所有列。

现在，如果我们想用加入的内容创建一个新文件，我们可以使用以下命令：

$join file1.txt file2.txt > newjoinfile.txt

//..this will direct the output of joined files
into a new file newjoinfile.txt 
containing the same output as the example 
above..//

加入命令的选项：

1. -a FILENUM ：另外，从文件 FILENUM 打印不成对的行，其中 FILENUM 为 1 或 2，对应于 FILE1 或 FILE2。
2. -e EMPTY ：用 EMPTY 替换缺少的输入字段。
3. -i - -ignore-case ：比较字段时忽略大小写差异。
4. -j FIELD ：相当于“-1 FIELD -2 FIELD”。
5. -o FORMAT :在构建输出行时遵守 FORMAT。
6. -t CHAR :使用 CHAR 作为输入和输出字段分隔符。
7. -v FILENUM ：类似于 -a FILENUM，但禁止连接的输出行。
8. -1 FIELD ：加入文件 1 的这个 FIELD。
9. -2 FIELD ：加入文件 2 的这个 FIELD。
10. --check-order ：检查输入是否正确排序，即使所有输入行都是可配对的。
11. --nocheck-order ：不检查输入是否正确排序。
12. --help : 显示帮助信息并退出。
13. - -version :显示版本信息并退出。

使用带有选项的连接
1. 使用 -a FILENUM 选项：现在，有时其中一个文件可能包含额外的字段，因此在这种情况下，join 命令的作用是默认情况下，它只打印可配对的行。例如，即使文件 file1.txt 包含一个额外的字段，前提是 file2.txt 的内容相同，那么 join 命令产生的输出也将相同：

//displaying the contents of file1.txt//
$cat file1.txt
1 AAYUSH
2 APAAR
3 HEMANT
4 KARTIK
5 DEEPAK

//displaying contents of file2.txt//
$cat file2.txt
1 101
2 102
3 103
4 104

//using join command//
$join file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104

// although file1.txt has extra field the 
output is not affected cause the 5 column in 
file1.txt was unpairable with any in file2.txt//

如果这种不成对的行很重要并且在加入文件后必须可见怎么办。在这种情况下，我们可以将-a 选项与 join 命令一起使用，这将有助于显示此类不成对的行。此选项要求用户传递文件编号，以便工具知道您在谈论哪个文件。

//using join with -a option//

//1 is used with -a to display the contents of
first file passed//

$join file1.txt file2.txt -a 1
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104
5 DEEPAK

//5 column of first file is 
also displayed with help of -a option
although it is unpairable//

2. 使用-v 选项：现在，如果您只想打印不成对的行，即在输出中抑制成对的行，则将-v 选项与join 命令一起使用。
此选项的工作方式与 -a 的工作方式完全相同（在下面的示例中，1 与 -v 一起使用）。

//using -v option with join//

$join file1.txt file2.txt -v 1
5 DEEPAK 

//the output only prints unpairable lines found
in first file passed//

3. 使用 -1, -2 和 -j 选项：我们已经知道 join 将文件行合并到一个公共字段上，默认情况下是第一个字段。但是，两个文件中的公共键不一定总是be the first column.join 命令提供选项，如果公共键不是第一列。
现在，如果您希望任一文件的第二个字段或两个文件的第二个字段成为连接的公共字段，您可以使用-1和-2命令行选项来执行此操作。这里的 -1 和 -2 代表第一个和第二个文件，这些选项需要一个数字参数，该参数引用相应文件的连接字段。通过下面的示例，这将很容易理解：

//displaying contents of first file//
$cat file1.txt
AAYUSH 1
APAAR 2
HEMANT 3
KARTIK 4

//displaying contents of second file//
$cat file2.txt
 101 1
 102 2
 103 3
 104 4

//now using join command //

$join -1 2 -2 2 file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104

//here -1 2 refers to the use of 2 column of
first file as the common field and -2 2
refers to the use of 2 column of second
file as the common field for joining//

因此，这就是我们如何使用除第一列以外的不同列作为连接的公共字段。
如果我们在两个文件中的公共字段位置相同（第一个除外），那么我们可以简单地用-j[field]替换命令中的 -1[field] -2[field]部分。因此，在上述情况下，命令可能是：

//using -j option with join//

$join -j2 file1.txt file2.txt
1 AAYUSH 101
2 APAAR 102
3 HEMANT 103
4 KARTIK 104

4. 使用 -i 选项：现在，关于 join 命令的另一件事是默认情况下，它区分大小写。例如，请考虑以下示例：

//displaying contents of file1.txt//
$cat file1.txt
A AAYUSH
B APAAR
C HEMANT
D KARTIK

//displaying contents of file2.txt//
$cat file2.txt
a 101
b 102
c 103
d 104

现在，如果您尝试使用默认（第一个）公共字段连接这两个文件，则不会发生任何事情。这是因为两个文件中字段元素的情况不同。要使 join 忽略此案例问题，请使用 -i 命令行选项。

//using -i option with join//
$join -i file1.txt file2.txt
A AAYUSH 101
B APAAR 102
C HEMANT 103
D KARTIK 104

5. 使用--nocheck-order 选项：默认情况下，join 命令检查提供的输入是否已排序，如果没有则报告。为了消除此错误/警告，我们必须使用 - -nocheck-order 命令，例如：

//syntax of join with --nocheck-order option//

$join --nocheck-order file1 file2

6. 使用 -t 选项：大多数时候，文件包含一些分隔符来分隔列。让我们用逗号分隔符更新文件。

$cat file1.txt
1, AAYUSH
2, APAAR
3, HEMANT
4, KARTIK
5, DEEPAK

//displaying contents of file2.txt//
$cat file2.txt
1, 101
2, 102
3, 103
4, 104

现在， -t 选项是我们用来在这种情况下指定分隔符的选项。
由于逗号是分隔符，我们将与 -t 一起指定它。

//using join with -t option//

$join -t, file1.txt file2.txt
1, AAYUSH, 101
2, APAAR, 102
3, HEMANT, 103
4, KARTIK, 104