从 Word 文档中提取段落的Java程序

本文演示了如何使用 Apache POI 包提供的 XWPFDocument 类的getParagraphs()方法从 word 文档中提取段落。 Apache POI 是一个由 Apache Software Foundation 开发和维护的项目，它提供库来使用Java对 Microsoft Office 文件执行大量操作。

要从word文件中提取段落，基本要求是导入以下Apache库。

poi-ooxml.jar

编程需要懂一点英语

方法

制定word文档的路径
为 word 文档创建 FileInputStream 和 XWPFDocument 对象。
使用getParagraphs()方法检索段落列表。
遍历段落列表以打印它。

执行

第一步：获取word文档所在的当前工作目录的路径。
步骤 2：使用上面指定的路径创建文件对象。
第三步：为word文档创建一个文档对象。
第 4 步：使用 getParagraphs() 方法从 word 文件中检索段落列表。
第 5 步：遍历段落列表
第 6 步：打印段落
第 7 步：关闭连接

样本输入

Word文档的内容如下：

执行

例子

Java

// Java program to extract paragraphs from a Word Document
  
// Importing IO package for basic file handling
import java.io.*;
import java.util.List;
// Importing Apache POI package
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
  
// Main class to extract paragraphs from word document
public class GFG {
  
    // Main driver method
    public static void main(String[] args) throws Exception
    {
  
        // Step 1: Getting path of the current working
        // directory where the word document is located
        String path = System.getProperty("user.dir");
        path = path + File.separator + "WordFile.docx";
  
        // Step 2: Creating a file object with the above
        // specified path.
        FileInputStream fin = new FileInputStream(path);
  
        // Step 3: Creating a document object for the word
        // document.
        XWPFDocument document = new XWPFDocument(fin);
  
        // Step 4: Using the getParagraphs() method to
        // retrieve the list of paragraphs from the word
        // file.
        List paragraphs
            = document.getParagraphs();
  
        // Step 5: Iterating through the list of paragraphs
        for (XWPFParagraph para : paragraphs) {
  
            // Step 6: Printing the paragraphs
            System.out.println(para.getText() + "\n");
        }
  
        // Step 7: Closing the connections
        document.close();
    }
}

从 Word 文档中提取段落的Java程序

方法

执行

样本输入

执行

Java

输出