PyPDF2 Python PDF - Python (1)

📌 相关文章

📜 PyPDF2 Python PDF - Python (1)

📅 最后修改于: 2023-12-03 14:45:45.208000 🧑 作者: Mango

PyPDF2 Python PDF - Python

PyPDF2 is a library in Python that can help you to manipulate and extract data from PDF files. It can be used to work with single PDF files or to merge multiple PDF files into one.

Installation

To install PyPDF2, you can use pip:

pip install PyPDF2

Extracting Text from a PDF

To extract text from a PDF file with PyPDF2, you can use the PdfFileReader class:

import PyPDF2

with open('example.pdf', 'rb') as pdf_file:
    pdf_reader = PyPDF2.PdfFileReader(pdf_file)

    for page in range(pdf_reader.getNumPages()):
        page_obj = pdf_reader.getPage(page)
        print(page_obj.extractText())

Merging PDFs

To merge multiple PDFs into one using PyPDF2, you can use the PdfFileMerger class:

import PyPDF2

pdf_merger = PyPDF2.PdfFileMerger()

pdf_merger.append('file1.pdf')
pdf_merger.append('file2.pdf')

with open('merged_files.pdf', 'wb') as pdf_output_file:
    pdf_merger.write(pdf_output_file)

Conclusion

PyPDF2 is a useful library for working with PDF files in Python. It can be used to extract text from PDFs or to merge multiple PDFs into one. With PyPDF2, you have the ability to manipulate and extract data from PDF files easily.