📜  Java程序删除文本文件中的重复行

📅  最后修改于: 2022-05-13 01:55:16.035000             🧑  作者: Mango

Java程序删除文本文件中的重复行

先决条件: PrintWriter , BufferedReader

给定一个文件 input.txt 。我们的任务是从中删除重复的行并将输出保存在文件say output.txt中

朴素算法:

1. Create PrintWriter object for output.txt
2. Open BufferedReader for input.txt
3. Run a loop for each line of input.txt
   3.1 flag = false
   3.2 Open BufferedReader for output.txt
   3.3 Run a loop for each line of output.txt
      ->  If  line of output.txt is equal to current line of input.txt 
            -> flag = true
            -> break loop

4. Check flag, if false
     -> write current line of input.txt to output.txt
     -> Flush PrintWriter stream

5. Close resources.

要成功运行以下程序 input.txt 必须在同一文件夹中退出或为其提供完整路径。

// Java program to remove
// duplicates from input.txt and 
// save output to output.txt
  
import java.io.*;
  
public class FileOperation
{
    public static void main(String[] args) throws IOException 
    {
        // PrintWriter object for output.txt
        PrintWriter pw = new PrintWriter("output.txt");
          
        // BufferedReader object for input.txt
        BufferedReader br1 = new BufferedReader(new FileReader("input.txt"));
          
        String line1 = br1.readLine();
          
        // loop for each line of input.txt
        while(line1 != null)
        {
            boolean flag = false;
              
            // BufferedReader object for output.txt
            BufferedReader br2 = new BufferedReader(new FileReader("output.txt"));
              
            String line2 = br2.readLine();
              
            // loop for each line of output.txt
            while(line2 != null)
            {
                  
                if(line1.equals(line2))
                {
                    flag = true;
                    break;
                }
                  
                line2 = br2.readLine();
              
            }
              
            // if flag = false
            // write line of input.txt to output.txt
            if(!flag){
                pw.println(line1);
                  
                // flushing is important here
                pw.flush();
            }
              
            line1 = br1.readLine();
              
        }
          
        // closing resources
        br1.close();
        pw.close();
          
        System.out.println("File operation performed successfully");
    }
}

输出:

File operation performed successfully

注意:如果 cwd(当前工作目录)中存在 output.txt,那么它将被上述程序覆盖,否则将创建新文件。

更好的解决方案是使用 HashSet 来存储 input.txt 的每一行。由于 set 忽略了重复值,所以在存储一行时,检查它是否已经存在于 hashset 中。仅当哈希集中不存在时才将其写入 output.txt。

要成功运行以下程序 input.txt 必须在同一文件夹中退出或为其提供完整路径。

// Efficient Java program to remove
// duplicates from input.txt and 
// save output to output.txt
  
import java.io.*;
import java.util.HashSet;
  
public class FileOperation
{
    public static void main(String[] args) throws IOException 
    {
        // PrintWriter object for output.txt
        PrintWriter pw = new PrintWriter("output.txt");
          
        // BufferedReader object for input.txt
        BufferedReader br = new BufferedReader(new FileReader("input.txt"));
          
        String line = br.readLine();
          
        // set store unique values
        HashSet hs = new HashSet();
          
        // loop for each line of input.txt
        while(line != null)
        {
            // write only if not
            // present in hashset
            if(hs.add(line))
                pw.println(line);
              
            line = br.readLine();
              
        }
          
        pw.flush();
          
        // closing resources
        br.close();
        pw.close();
          
        System.out.println("File operation performed successfully");
    }
}

输出:

File operation performed successfully

注意:如果 cwd(当前工作目录)中存在 output.txt,那么它将被上述程序覆盖,否则将创建新文件。